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PREFACE 


The application of genetical principles to the study of human metrical characters, such as stature, 
was first attempted by Galton who, in 1887, used a method of correlation for measuring likeness 
between relatives. The theoretical basis of the results remained obscure until Mendelian prin- 
ciples of inheritance were applied. Karl Pearson’s first attempt, in 1904, to account for the 
observed correlation values in this way was not satisfactory, but he succeeded in explaining the 
results in 1909 after the idea of random mating had been introduced into human genetics. It 
was not until 1918, however, that the matter was properly cleared up by Fisher’s classical study, 
published in the Proceedings of the Royal Society of Edinburgh. Many aspects of the subject were 
dealt with in this paper, such as the effects of dominance and assortative mating on the correla- 
tion values. In some sections the exposition is very difficult to follow. The value of Fisher’s 
contribution to the subject, however, is so great that Professor Moran and Professor Smith have 
thought it worth while to discuss his text in detail and criticize it where they think necessary. 
For this purpose the reprinting of the original paper is necessary and the running commentary 
provided should prove of great value both to students of genetics and of statistics. 


L. S. PENROSE 


INTRODUCTION 


Sir Ronald Fisher’s 1918 paper on the correlations between relatives is one of the classical papers 
of scientific literature. A few papers had previously appeared giving the expected values of the 
correlations on very simple Mendelian assumptions. Fisher succeeded in dealing with all the 
more obvious complications such as complete or partial recessivity, multiple allelism, epistacy, 
linkage, and assortative mating, and indeed with combinations of these, in one single paper. 
Since these complications are known or virtually certain to occur in real examples, this was a 
most important and necessary advance. Furthermore, this paper was published when Fisher 
was still only 28 years of age. The treatment suffers from a few minor defects. The model for 
assortative mating is rather a special one, though very ingenious; the argument dealing with 
linked genes is incomplete; and there is no mention of sex-linkage. But the first two of these 
defects are not easy to repair, and there has been no appreciable advance on Fisher’s treatment 
of these points in the 47 years since his paper appeared. 

It is also of interest that we can see in this paper the beginning of some of Fisher’s most im- 
portant statistical ideas. Thus he sets out the idea of partitioning variance into components. 
This presumably led to the Analysis of Variance. Fisher uses in this paper a technique which is 
very closely related to the analysis of variance applied to linear regression. 

We are very much indebted to the Royal Society of Edinburgh and to Fisher’s executor the 
Public Trustee of South Australia for permission to reproduce his original paper, and to Pro- 
fessor L.S. Penrose for his encouragement. The text of Fisher’s paper has here been set in small 
type, enclosed in double quotations marks. (Some small changes have been made in the mathe- 
matical typography, in order to make it more consistent with the usual present-day practice used 
in the commentary. But there has been no alteration in the substance.) The commentary has been 
printed in larger type. 

We hope that we have everywhere interpreted Fisher’s ideas correctly and will succeed in 


making the paper more easy to follow. 
P.A.P.MORAN 


C.A.B.SMITH 
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“ Several attempts have already been made to interpret the well-established results of biometry in accordance 
with the Mendelian scheme of inheritance. It is here attempted to ascertain the biometrical properties of a 
population of a more general type than has hitherto been examined, inheritance in which follows this scheme. 
It is hoped that in this way it will be possible to make a more exact analysis of the causes of human variability. 
The great body of available statistics show us that the deviations of a human measurement from its mean 
follow very closely the Normal Law of Errors, and, therefore, that the variability may be uniformly measured 
by the standard deviation corresponding to the square root of the mean square error. When there are two 
independent causes of variability capable of producing in an otherwise uniform population distributions with 
standard deviations a, and @5, it is found that the distribution, when both causes act together, has a standard 
deviation ,/(a? + 02).” 


This assumes that the causes act additively and not, for example, multiplicatively. 


“Tt is therefore desirable in analysing the causes of variability to deal with the square of the standard 
deviation as the measure of variability. We shall term this quantity the Variance of the normal population to 
which it refers, and we may now ascribe to the constituent causes fractions or percentages of the total variance 


a M&S 
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which they together produce. It is desirable on the one hand that the elementary ideas at the basis of the 
calculus of correlations should be clearly understood, and easily expressed in ordinary language, and on the 
other that loose phrases about the‘ percentage of causation’, which obscure the essential distinction between 
the individual and the population, should be carefully avoided. 

‘* Speaking always of normal populations, when the coefficient of correlation between father and son, in 
stature let us say, is 7, it follows that for the group of sons of fathers of any given height the variance is a 
fraction, 1 —7?, of the variance of sons in general. Thus if the correlation is 0-5, we have accounted by reference 
to the height of the father for one quarter of the variance of the sons.” 


This does not mean that one quarter of the variance is due to the direct genetic link between 
father and son. Some of the correlation may arise indirectly because of a resemblance between 
father and mother, and there is a direct genetic link between mother and son. 


‘* For the remaining three quarters we must account by some other cause. Ifthe two parents are independent, 
a second quarter may be ascribed to the mother. If father and mother, as usually happens, are positively 
correlated, a less amount must be added to obtain the joint contribution of the two parents, since some of the 
mother’s contribution will in this case have been already included with the father’s. In a similar way each of 
the ancestors makes an independent contribution, but the total amount of variance to be ascribed to the 
measurements of ancestors, including parents, cannot greatly exceed one half of the total. We may know this 
by considering the difference between brothers of the same fraternity: of these the whole ancestry is identical, 
so that we may expect them to resemble one another rather more than persons whose ancestry, identical in 
respect of height, consists of different persons. For stature the coefficient of correlation between brothers is 
about 0-54, which we may interpret* by saying that 54 per cent of their variance is accounted for by ancestry 
alone, and that 46 per cent must have some other explanation.” 


Fisher is using ‘accounted’ for in the technical sense that R? = 0-54 is the multiple correlation 
of the measured value on the values of all ancestors. Fisher will show later that most of the 
remaining variability is also due to the parents, being caused by their heterozygosity. This 
does not contribute to the regression of child on parent, and thus, in the sense of the 
theory of regression, this part of the child’s variability is not ‘accounted for’ by the parents’ 
variability. 

Suppose that x is a biological measurement on a son obtained by choosing a family at random 
out of a large population of families and choosing a son at random out of this family. Let x be 
measured from its mean and have variance o?. If X is the measurement on another son chosen 
from the same family the expected value of («—X)? will be 2V, where V is the variance of a son 
around the family mean. This mean value of (~—X)? is the mean over all families. 

On the other hand, if z and X are the measurements on two brothers in the same family the 
mean value of (# — X)? taken over all families must be 20?(1 —r), where 7 is the correlation between 
brothers. Thus 2V = 207(1—r), and V/o? = 1—r. 

Suppose now that x and X are measurements on two parents, and z on their offspring. Then the 
proportion of the variance of z accounted for by the two parents is the multiple correlation of z 


* The correlation is determined from the measurements of n individuals, #,, #5, ...,@,, and of their brothers, 
Yr Yo «++» Yr; let us suppose that each pair of brothers is a random sample of two from an infinite fraternity, that 
is to say from all the sons which a pair of parents might conceivably have produced, and that the variance of 
each such fraternity is V, while that of the sons in general is 7. Then the mean value of (a — y)? will be 2V, since 
each brother contributes the variance V. But expanding the expression, we find the mean value of both x2 and 
y® is o*, while that of xy is ro°, where r is the fraternal correlation. Hence 2V = 20%(1—r), or V/o? = 1-7. 
Taking the values 0-5066 and 0-2804 for the parental and marital correlations, we find that the heights of the 
parents alone account for 40-10 per cent of the variance of the children, whereas the total effect of ancestry, 
deduced from the fraternal correlation, is 54-33 per cent. [All footnotes are from the original paper by Fisher. ] 
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with # and X, i.e. in this case the correlation of z with x +X. The variances of w, X, and z are o? 
each and the variance of « + X is 20°(1+r,,), where 7, is the correlation between x and _X, i.e. the 
‘marital’ correlation. The covariance of z with (w+ X) is the mean value of z(a +X) which equals 
20°r,, where r,, is the correlation between a son and a parent. The multiple correlation is therefore 
rp(1+7,)~* and in the particular case considered this is 
0-5066 (1:2804)—! = 0:3956, 

which differs slightly from Fisher’s value 0-4010. What Fisher calls the ‘total effect of ancestry’ 
is given by the observed fraternal correlation, which is 0-5433, because this is the square of the 
multiple correlation coefficient with all the ancestors and is therefore the fractional reduction in 


variance when all the ancestral values are held fixed. The standard errors of these estimates are 
not given. 


“Tt is not sufficient to ascribe this last residue to the effects of environment. Numerous investigations by 
Galton and Pearson have shown that all measurable environment has much less effect on such measurements 
as stature. Further, the facts collected by Galton respecting identical twins show that in this case, where the 
essential nature is the same, the variance is far less. The simplest hypothesis, and the one which we shall 
examine, is that such features as stature are determined by a large number of Mendelian factors, and that the 
large variance among children of the same parents is due to the segregation of those factors in respect to which 
the parents are heterozygous. Upon this hypothesis we will attempt to determine how much more of the 
variance, in different measurable features, beyond that which is indicated by the fraternal correlation, is due 
to innate and heritable factors. 

“In 1903 Karl Pearson devoted to a first examination of this hypothesis the twelfth of his Mathematical 
Contributions to the Theory of Evolution (‘On a Generalised Theory of Alternative Inheritance, with special 
reference to Mendel’s Laws,’ Phil. Trans., vol. cou, A, pp. 53-87. The subject had been previously opened by 
Udny Yule, New Phytologist, vol. 1). For a population of n equally important Mendelian pairs, the dominant 
and recessive phases being present in equal numbers, and the different factors combining their effects by 
simple addition, he found that the correlation coefficients worked out uniformly too low. The parental corre- 
lations were 4 and the fraternal -3;.* 

“These low values, as was pointed out by Yule at the Conference on Genetics in 1906 (Horticultural Society’s 
Report), could be satisfactorily explained as due to the assumption of complete dominance. It is true that 
dominance is a very general Mendelian phenomenon, but it is purely somatic, and if better agreements can be 
obtained without assuming it in an extreme and rigorous sense, we are justified in testing a wider hypothesis. 
Yule, although dealing with by no means the most general case, obtained results which are formally almost 
general. He shows the similarity of the effects of dominance and of environment in reducing the correlations 
between relatives, but states that they are identical, an assertion to which, as I shall show, there is aremarkable 
exception, which enables us, as far as existing statistics allow, to separate them and to estimate how much of 
the total variance is due to dominance and how much to arbitrary outside causes. 

‘In the following investigation we find it unnecessary to assume that the different Mendelian factors are of 
equal importance, and we allow the different phases of each to occur in any proportions consistent with the 


* 'The case of the fraternal correlations has been unfortunately complicated by the belief that the correlation 
on a Mendelian hypothesis would depend on the number of the fraternity. In a family, for instance, in which 
four Mendelian types are liable to occur in equal numbers, it was assumed that of a family of four, one would 
be of each type; in a family of eight, two of each type; and so on. If this were the case, then in such families, 
one being of the type A would make it less likely, in small families impossible, for a second to be of this type. 
If, as was Mendel’s hypothesis, the different qualities were carried by different gametes, each brother would 
have an independent and equal chance of each of the four possibilities. Thus the formulae giving the fraternal 
correlations in terms of the number of the fraternity give values too small. The right value on Mendel’s theory 
is that for an infinite fraternity. As Pearson suggested in the same paper, ‘ probably the most correct way of 
looking at any fraternal correlation table would be to suppose it a random sample of all pairs of brothers 
which would be obtained by giving a large, or even indefinitely large, fertility to each pair, for what we actually 
do is to take families of varying size and take as many pairs of brothers as they provide.’ In spite of this, the 
same confusing supposition appears in a paper by Snow ‘ On the Determination of the Chief Correlations 
between Collaterals in the Case of a Simple Mendelian Population Mating at Random’ (E.C.Snow, B.A., 
Proc. Roy. Soc. June 1910); and in one by John Brownlee, ‘ The Significance of the Correlation Coefficient 
when applied to Mendelian Distributions’ (Proc. Roy. Soc. Hdinb. Jan. 1910). 
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conditions of mating. The heterozygote is from the first assumed to have any value between those of the 
dominant and the recessive, or even outside this range, which terms therefore lose their polarity, and become 
merely the means of distinguishing one pure phase from the other. In order to proceed from the simple to the 
complex we assume at first random mating, the independence of the different factors, and that the factors are 
sufficiently numerous to allow us to neglect certain small quantities.” 


Although Fisher states that random mating is assumed at first, the theory is developed in terms 
more general than this and he is careful to state when the additional assumption is introduced. 
He also assumes for the present that each measured character is the result of summing a large 
number of small factors which are independent, i.e. that there is no linkage. It then follows from 
the standard properties of means, variances and covariances that the mean value of the character 
in the population is equal to the sum of the means of the individual small factors, the variance is 
similarly the sum of the individual variances, and the same is true of the covariances. 

Suppose that for the particular factor considered the two possible alleles are A, and A,. We 
then have the following table: 


Zygote A,A, A,A, A, A, 
Phenotypic effect a d —a 
Frequency P 2Q R 


If the individuals concerned had been produced by a process involving random mating and no 
selection we would have PR=@ 


and p=P+Q;, g=Q+Rk, 


would be the gene frequencies of the A, and A, genes so that 


P=p, Q=pq, R=¢@. 


As assortative mating is considered later, it is more convenient to develop the theory in terms 
of P, @ and # without assuming the Hardy-Weinberg formula except when random mating is 
explicitly asserted. 

The variance «? given by (1) is the contribution of this factor to the total variance 0? whether 
or not the distribution is normal. The fact that the distribution of the sum of all factors will be 
approximately normally distributed (particularly if measured after a suitable transformation) 
will follow from the version of the Central Limit Theorem which proves asymptotic normality for 
a sum of independent random variables each of which is ‘individually negligible’ in a certain 
precise sense. The calculation of the third and fourth moment here is merely illustrative. 


“1, Let us suppose that the difference caused by a single Mendelian factor is represented in its three phases 
by the difference of the quantities a, d, —a, and that these phases exist in any population with relative 
frequency P, 2Q, R, where P+2Q+R=1. 

“Then a population in which this factor is the only cause of variability has its mean at 


m = Pa+2Qd— Ra, 
so that P(a—m)+2Q(d—m)—R(a+m) = 0. 
Let now P(a—m)?+2Q(d—m)?+ R(a+m)? = a? (I) 


a? then is the variance due to this factor, for it is easily seen that when two such factors are combined at random, 
the mean square deviation from the new mean is equal to the sum of the values of @2 for the two factors 
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separately. In general the mean square deviation due to a number of such factors associated at random will be 
written 
oF = Da?. (11) 


“To justify our statement that « is the contribution which a single factor makes to the total variance, it 
is only necessary to show that when the number of such factors is large the distributions will take the normal 
form. 


“Tf now we write fs = P(a—m)? + 2Q(d—m)?— R(a+m)3, 
fey = P(a—m)*+ 2Q(d—m)*+ R(a+m)!, 


and if M, and M, are the third and fourth moments of the population, the variance of which is due solely 
to the random combination of such factors, it is easy to see that 


M;= 2Ms, 
M,—30* = X(",— 3a). 
Now the departure from normality of the population may be measured by means of the two ratios 


2 
3 


M 
ee and f,=— 


The first of these is Zplg)?/(La?)s, 


and is of the order 1/n, where n is the number of factors concerned, while the second differs from its Gaussian 
value 3 also by a quantity of the order 1/n.”’ 


In sections 2 and 3 the following problem is considered. Suppose that the measurement x 
(measured from the population mean) is the sum of the effects of a large number of independently 
segregating factors. For a parent (say a father) and an offspring (say a son) these measurements 
will be distributed, to a high degree of approximation, in a bivariate normal distribution, and we 
wish to calculate the regression coefficient of the value for the son on the value for the father. This 
is done by an ingenious approximate argument. In this it is assumed that the parents mate at 
random but not necessarily the grandparents. 

x (whose variance is a?) is the sum of a large number of independent factor pairs of which a 
typical one is (A,, A,) whose contribution to the variance is «*. Suppose the proportions of 
(A, A,) (A, Az) and (A, A,) in the whole population are P, Q, R. We now choose a particular value 
x for the father. In the subpopulation of fathers having this value, the frequencies of (A, A,) 
(A, A,) and (A, A,) will be different and we write P, Q and R for them. Our first task is to calculate 
these. 

To do this, consider a population of fathers in which all the factors have frequencies the same as 
in the above population except for the one factor considered for which all individuals are to be 
heterozygotes (A, A,). The variance of this population is then o?—«* since the component 
variance, «, due to A,, A,, has been removed in this way. If we now modify this in the manner 
described and use the fact that the distribution of 2 is normal we see that the frequencies of 
(A, A,), (A, Ag) and (A, A,) must be 

rae 
Pexp a Gata bon Ct0x 
in order to get the previously considered population. From this we obtain (II1) which is an 
approximation obtained by supposing that 


a?/o? and 27/0? 


are small. These give the proportions of the three types in a population of fathers with value z. 
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We suppose that these fathers mate at random with the general population. For the particular 
factor considered we then get Table A, in which each cell gives the values of the possible offspring 
with their frequencies. 


TABLE A 
Father array 
i a a a 
Mother, A,A, A, A, A, A, 
from rest a d —a 
of population 12 2 R 
Se a fg ae os ee 
A,A, a Hes a a d de 
PP PQ PQ PR 
A Ay id 22 a d i, ed —a d —a 
PQ PQ QQ 220 YQ QR Qk 
A,A, -a R d d —a =a 
PR QR QR RR 


The sons therefore have values a, d and —a with probabilities 
PP+ PQ+ PQ+ QQ, 
PQ + PQ+ PR+ PR + RQ+RQ+2QQ, 
QQ+ QR+ QR+ RR. 
We now insert the values given by (III) and ignore terms of higher order than the first in #/o* and 


we obtain the formulae given at the beginning of paragraph 3. Multiplying these by a, d and—a 
and adding, we find that the expected value of the mean of the offspring is 


2d(PR— Q?) + = [PQ(a—d)? + 2PR(a2—d?) + QR(a+d)24+2(PR—Q?)d(d—m)]. (Ia) 
(A factor 2 multiplying (PR— Q?) inside the square bracket is omitted in Fisher.) Note that in 
order to obtain (IIIa) it is necessary to use the result 
m(P+2Q0+ Rk) = aP+2dQ-—aRk. 
If the parents are the result of a mating at random 


PR-@Q = 0, 
and (IIIa) simplifies to (IV). 
Thus (IV) has been obtained by an approximate argument. However (IV) is exact in the sense 
that it gives the ratio of the part of the covariance, which is due to the factor considered, to 0. 
This means that if x is the value of the father and X of the son, where 


C= By ae ee AT Ae es 
and #;, X; are the values of the contribution made by factor 7, then 
covariance (,, X,;) = [PQ(a—d)? + 2PR(a?—d?) + QR(a+d)?}. 


(IIIa) is exact whether PR— Q? = 0, or not. 
We shall now prove this, and in doing so we shall revert to the notation P, Q, R instead of 
P, Q, Ras Fisher does this in paragraph 4 onwards. In this way we will see that (IIIa) is also 
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exact. In the association table (‘Table B), the columns correspond to the genotypes of the male 
parent and the rows to the genotype of the female parent. In each cell the genotypes of the 
offspring with their probabilities are given, assuming random mating in the parents. From this 
we can immediately extract an association table for parent/offspring (‘Table C). 


TaBLe B 
Male parent 
li aa a 
A,|A, A, A, A,A, 
Female a d —a 
parent fe 20) R 
a a (a 15> = 
A,A, a ye a a d d 
ies PQ PQ PR 
A,A, d 2Q) a d a d —O d —a 
PQ PQ Q2 2Q2 Q2 QR QR 
Wag) —@ tt d d —a —a 
PR OR QR ke? 
TABLE C 
Parent 
: A 
Offspring A, Ay A, A, A,A, 
A, A, P?+PQ PQ+Q? 0 
A,A, PQ+PR PQ +2Q7+QR PR+QR 
A,A, 0 Q?+QR QR+ Rk 


Notice that unlike Tables A and B, Table C is not symmetric about the leading diagonal but 
does still have another type of symmetry about the other diagonal resulting from the symmetric 
role of the two factors, A, and A, (the two previous tables of course also have this type of 
symmetry). 

From Table C in turn we can find the mean value of the offspring multiplied by the probability 
of the parent, for each of the parental types, and this is shown in Table D. The sum of the third 
column gives the mean value of the offspring which is 


2d(PR—@Q?). 


TABLE D 
Probability of parental 
Parental type type multiplied by mean 

and its value value of offspring 
co Ns 
A,A, a aP?+aPQ+dPQ+dPR 
A,A, d aPQ—aQhk + d(PQ + 2Q?+ QR) 
A,A, -—a@ dPR+dQR—aQkR—ak* 


The covariance uncorrected for the means is the sum of the products of the first and second 
columns. Calculating this and subtracting the correction for the means which is 


mim + 2d(PR —Q?)}, 


we verify the formula before (IV) which is therefore exact when corrected as in (IIa). 
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When the parents have been produced by random mating we have PR—@? = 0 which is the 
Hardy-Weinberg relation and we can then write 


P=p, Q=pq, R=, 
where p = P+Q,q¢=Q+R. 


“2. Ifthere are a great number of different factors, so that o is large compared to every separate «, we may 
investigate the proportions in which the different phases occur in a selected array of individuals. Since the 
deviation of an individual is simply due to a random combination of the deviations of separate factors, we 
must expect a given array of deviation, let us say xz, to contain the phases of each factor in rather different 
proportions to those in which they exist in the whole population. The latter will be represented now by P, 2Q, R, 
while P, 2Q, & stand for the proportions in some particular array under consideration. 

‘Consider a population which is the same in every respect as the one we are dealing with save that all its 
members have one particular factor in the heterozygous phase, and let us modify it by choosing of each array 
a proportion P which are to become dominants and to increase by a—d, and a proportion R which become 
recessive and diminish by a+d: the mean is thereby moved to the extent m—d. 

‘““Of those which after this modification find themselves in the array with deviation x, the dominants 
formerly had a deviation «—a+m, the heterozygotes ~—d+m, and the recessives «+a+m, and since the 
variance of the original population was 0? — «2, the frequencies of these three types are in the ratio 


ee (e—a+m)?) _— (w—d+m)*) — (w+a+m)? 
IR — —________}; xp { ———_——__ }: — =— —}, 
exp| TC PW peak vad meme tee perma 
or, when @ is great compared to a, so that a*/a? may be neglected, 
s 2 
IPE [ +Za-m)| 
= ae 
Q=9Q [ +Za-m | (IIT) 


IR RL 1-2 (a+m) | 
o 


giving the proportions in which the phases occur in the array of deviation wx. 
“3. Hence the members of this array mating at random will have offspring distributed in the three phases 
in the proportion 


a x ae x = x 
Pl +3 (am) | +PQ [ 245 (a-m+d—m) | + @? E + Zam) |, 
PQ [ 2+ 5 (@—m+a—m | + 2Q2 E +2 d—m) | +PR | 2-5 2m) | +QR [ 2+ d-m—a—m | 
o o o? o ‘ 


Q? [ +5 (d- m) | +QR [ 2+ Z3(d—m—a—m | +R? E —Z arm | ; 
and therefore the deviation of the mean of the offspring is 
2d(PR —Q2) +5 [PQ(a—d)?-+ 2PR(a?—d2) + QR (a +d)? + (PR—Q?) d(d—m)]. 
‘“* Omitting the terms in (PR —Q2?), which for random mating is zero, the regression due to a single factor is 
_ [PQ(a—d)? + 2PR(a2—d?) + QR(a+d)°]. (IV) 


‘4, To interpret this expression, consider what is involved in taking a, d, —a as representing the three 
phases of a factor. Genetically the heterozygote is intermediate between the dominant and the recessive, 
somatically it differs from their mean by d. The steps from recessive to heterozygote and from heterozygote 
to dominant are genetically identical, and may change from one to the other in passing from father to son. 
Somatically the steps are of different importance, and the soma to some extent disguises the true genetic 
nature. There is in dominance a certain latency. We may say that the somatic effects of identical genetic 
changes are not additive, and for this reason the genetic similarity of relations is partly obscured in the 
statistical aggregate. A similar deviation from the addition of superimposed effects may occur between 
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different Mendelian factors. We may use the term Epistacy to describe such deviation, which although 
potentially more complicated, has similar statistical effects to dominance. If the two sexes are considered as 
Mendelian alternatives, the fact that other Mendelian factors affect them to different extents may be regarded 
as an example of epistacy.”’ 


The value, d, for the heterozygote A, A, will not be exactly intermediate between the value a 
for A, A, and the value —a for A, A, unless d = 0. Fisher proposes to replace these by values for 
which the heterozygote is c +b, c, c—b. These values are fitted by least squares, i.e. by minimizing 
the sum (Fisher uses S without a suffix for summation), 


S, = P(¢+b—a)*+ 2Q(c—d)* + R(e—b +a). 
This procedure is equivalent to considering the linear regression of the measured value on the 


number of A, genes present, and the reason for its usefulness will appear later. To minimize S, we 
have to solve the equations 


1 a8, 

2 ab = P(c+b—a)—R(c—b+a) = 0, 

1 

5 a = P(c+b—a)+2Q(c—d)+ R(c—b+a) = 0. 
The solution is c= es i= poleetls 


where 7 = PQ+2PR+QR. 
Fisher’s formula for 6 should have the first plus sign changed to minus. Notice that b and c 
depend not only on a and d, but also on the frequencies P, 2Q, R. 
Notice also that if PR = Q?, T = Q. 
Using these values we find the deviations from the regression line for A, A,, A, A,, and A, A, 


pope c+b—a =2RQd/T, c—d=—2PRd/T, c—b+a = 2PQd/T. - (Va) 
These deviations have the expected value 
P(c+b—a) + 2Q(c—d)+R(c—b+a) = 2dT (PRQ—-2Q0PR+ RPQ) = 0. 
Their variance is therefore 
62 = P(c+b—a)?+ 2Q(c—d)?+ R(c—b+a)? 
= 4PQRd?/T. (IVb) 
This is also by definition the minimum value of S,, as follows from the ordinary least squares 


regression theory. 
The covariance between the ‘representative values’ and the ‘deviations from linearity’ is then 


the mean product (since the mean deviation is zero). This is 
P(c+b—a) (¢+6)+2Q(c—d)c+ R(c—b +a) (c—b) 
= 2dT-\{[PQR(c +b) —2QPRce+ RPQ(c—b)] 
= 0; 
Thus the correlation is zero as again follows from the usual regression theory. 
The total ‘genotypic’ variance is 


a2 = P(a—m)? + 2Q(d—m)*? + R( -—a—m)?, 
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and can be decomposed into two parts. The first of these is the variance of the representative 
ERG f? = P(e+b—m)?+ 2Q(c—m)?+ R(e—b—m)?, 
which is nowadays called the ‘genetic’ variance due to the A,, A, genes. The second is the variance 
of the ‘dominance deviations’, 6°, as given by (IVb) above. We can verify algebraically that 
a2 = fr+ 2, 

which is again a consequence of the usual regression theory, especially when the latter is pre- 
sented in an analysis of variance table. 

If random mating holds, 7’ = @, and PR = Q? so that 

a? = 2a°Q —40(P—R)ad+2Q(P+ Rk) d? 
and B® = 2a0°Q —4Q(P — R) ad + 2Q(P — R)* a2. 
(Fisher has 2a?Q? in this formula (formula (V1I)) which is wrong.) Then 
o2— B2 = 402d? = 02, 

The total variance, o?, of the character in the population is the sum, L2@?, over all pairs of genes 

like A,, Ay, since we suppose the character is additive. Fisher writes 


75 = 3,8*,-€* arpee, 


so that o? = 72+ &?. 


“'The contributions of imperfectly additive genetic factors divide themselves for statistical purposes into 
two parts: an additive part which reflects the genetic nature without distortion, and gives rise to the corre- 
lations which one obtains; and a residue which acts in much the same way as an arbitrary error introduced 
into the measurements. Thus, if for a, d,—a we substitute the linear series 


c+b6,c,c—b, 
and choose 6 and ¢ in such a way that 
P(e+b—a)*+2Q(c—d)?+ R(e—b+a)? 
is a minimum, we find for this minimum value 62, 
ns 4PQR d? 
~ PQ+2PR+QR’ 


which is the contribution to the variance of the irregular behaviour of the soma; and for the contribution of the 


additi t, £2, wher 
eddie ve Darts Ja) Wane f? = P(e+b—m)?+2Q(c—m)?+ R(e—b—m)?, 


we obtain [P= 26> PQ+2PR+4 QR), 
and since b= at porarn: OR : 
2Q°(P— RB)? d* 

oh fp? = 2a*(PQ+2PR+QR)— _ — =e 
we have pb a*(PQ + + QR) —4Q(P ht) ad+ 507 SPR+OR 

“5. These expressions may be much simplified by using the equation 

SS 
for then 6? = 4Q?d? (V) 
f? = 2a°Q?— 4Q(P — R) ad + 2Q(P — R)?d*, (VI) 


which appears in the regression in Article 3 (IV), and 
a? = 2a°Q(P—R) ad + 2Q(P +R) d*. (VII) 
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““TIn general a? = p+ $2, 
and if (oi Mom po 5 (VIII) 
; TA = >) 83) (IX) 
and e? = X62, (X) 
then a= 7? +e.” 


The reasons for introducing this type of regression analysis are most easily seen from later 
formulations of the problem by Malécot (Les mathématiques de V'hérédité, Paris (1948)), Li and 
Sacks (Biometrics, 10, (1954), 347-360). At any given locus any individual has two genes which 
can be distinguished by their origin one from the individual’s father, and one from his mother. 


The effect produced by this pair of genes can be split up into three components in the following 
way: 


First gene Second gene Frequency Dime) ae ect eee I vs 
A, A, P a= 4(c+b)+4(c+6)+(a—c—b) 
A, Ay Q = H(c+b)+He—6)+(d—e) 
A, as Q d = Kc—b)+4(c+b) +(d—c) 
A, A, R —a= 4(c—b)+(e—b)+(—a—c+b) 


The first component is $(c +6) or }(¢—b) according as the first gene is 4, or Ay, and similarly 
for the second gene. The third component is a deviation from linearity. Fisher’s ‘representative 
value’ is x,+,. With random mating we find from what has gone previously, that 

46? = var (x,) = var (a,), 6? = var'(a,); 
and the three covariances between the «’s are zero. The correlation between (x,+,+#,) and 
(%1 +22) is then Va RA (24 +2) nF. B : p 
(var (%,+%_+2%3) var (a, +a,)}) — (a2f2)$ 


Now consider a parent and offspring with values (2, +2, +23) and (X,+X,+ X3) respectively. 
We make the convention that the ‘first’ gene (which results in the contributions #, and X,) is the 
gene which this parent hands on to the offspring, so that x, = X,. The second genes in the two 
individuals are A, and Ag, with probabilities p and q, independently of each other. Thus x, = X,, 
%, and X, are distributed independently of each other, and so are the pairs (22, .X3) and (73, X9). 
x, = X, is uncorrelated with x, and X, from what has been proved above. 

It can also be shown that x, and X, are uncorrelated. This can be done as follows. Suppose that 
the gene passed from parent to offspring is 4,. Then using the above table and the fact that 


P=P,2=Pt B(x, | first gene is Ay) = p(a—c—b) + q(d—-0) 


= p(2RQdT-) + q(—2PRdAT-) = 0. 


Similarly, E(X, | first gene is A,) = 0. 

Then E(x, X, | first gene is A,) 
= E(x, | first gene is A,) E(X; | first gene is A,) 
= 0. 


The same holds if the first gene is A, and thus 
E(x3) = E(X3) = E(x, X55), 
so that cov (%3, X3) = 0. 
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The five variates 2, = X,, 7,73, X2, X, are thus uncorrelated in pairs, so that the correlation 
between a and X arises only through the pair x,, X,. Then 


cov (%, X) = cov (z,, X,) = var(z,) = $f". 


Fisher does not consider the components x, and X, separately but the ‘representative values’ 
(%,+%,) and (X,+X,). Thus from his point of view the correlation between parent and offspring 
arises solely from that of the representative values. 

Most pairs of relatives in a population can share a gene which may be passed directly from one 
relative to another as with father and son, or which may come from a common ancestor as with 
brothers. They are then said to have genes which are ‘identical by descent’, as distinct from pairs 
of genes which may be identical by chance. We can say that father and son have ‘one gene in 
common’. Similarly, uncle and nephew have probability 4 of having a gene in common, first 
cousins have probability } of having a gene in common, and so on. Then arguing as above the 


correlation between 21, 2%», %3,X,,X»,Xs (where 7,+x,+2, and X,+X,+X; refer to the two 
individuals) are all zero except that cov (a, X,) = 4up?, 


where w is the probability of sharing a gene. 

With pairs of sibs, or double first cousins, the situation is more complicated, since the 
individual can then share two genes at once. In such a case each aw, may be correlated with 
X, (r=1,2,3), but if r +s, the pairs (x,,x,),(X,,X,),(”,,X,) are uncorrelated. Thus Fisher 
remarks that with sibs and other such cases, it is necessary to take into account the correlation 
between the ‘dominance deviations’ x, and X3. 


A valuable general theory of this approach is given by Trustrum (Proc. Camb. Phil. Soc. 57 (1961), 
315-320). 


‘ The regression due to a single factor of the mean of the offspring of parents of a given array is 


a? fe 


and adding up the effects of all factors we find es 


a 


(XI) 


We may regard this formula otherwise. The correlation between the actual somatic measurements such as 
a, d, —a, and the representative linear quantities c+ b, c, c—b is T/o. Thus the correlation of parent and child 
is made up of three factors, two of them representing the relations between the real and the representative 
measurements, and the third the correlation between the representative measurements of the two relatives. 
Thus the effect of dominance is simply to reduce certain relationship correlations in the ratio 7?/o°. 

‘“‘ The values of the correlations between the representative measurements for random mating, which may 
be called the genetic correlations, are given in the accompanying table: 


Half 2nd Half Ist Half Ancestral 


Generations cousin cousin brother line Brother Istcousin 2nd cousin 
Own 1/64 1/16 1/4 1 1/2 1/8 1/32 
Father’s 1/128 1/32 1/8 1/2 1/4 1/16 1/64 
Grandfather’s 1/256 1/64 1/16 1/4 1/8 1/32 1/128 
Great-grandfather’s 1/512 1/128 1/32 1/8 1/16 1/64 1/256 


Great-great-grandfather’s 1/1024 1/256 1/64 1/16 1/32 1/128 1/512 


P.A.P.MORAN AND C. A. B. SMITH 15 


“6. The above reasoning as to the effects of dominance applies without modification to the ancestral line, 
but in a special class of collaterals requires reconsideration. The reason is that the deviations from linearity are 
now themselves correlated. In other words, a father who is heterozygote instead of recessive may have 
offspring who show a similar variation; but they may also be changed from heterozygote to dominant. In the 
case of siblings, however, whichever change takes place in one is more likely to occur in the other. 

“Thus, writing 7, 7, k for the deviations 

a—m, d—m, —(a+m), 
so that iP +2jQ0+kR = 0 (XII) 


and p?, pq, q* for P, Y, R, we can draw up association tables for different pairs of relatives, and readily obtain 
the correlations between them by substituting the fractions in the nine sections of the table as coefficients of 
a quadratic function in i, 7, k. 

“Thus the association table between parent and child is 


ine pq = 
pq PUpt+4q) Pr 
— Pp? g 


from which we obtain the quadratic 
pi? + 2p*qy + pup +4)j° + 2pg"yk + PR", 


] 
which is equal to —— (p%i — qk)? = 48%, 
q Fr tg q 3 


The association table for parent and child given by Fisher above has its columns corresponding 
to the three genotypes A, A,, A, Az, A, A, respectively, and hence to the deviations 7,7, k. The 
rows have a similar meaning for the offspring. The entries in the table are the respective 
probabilities of occurrence of all combinations of father and offspring; e.g. the combination 
father A, A,, offspring A,A,, has probability p*. The entries can be found by putting P = p?, 
Q = pq, R = gq? in Table C, using p+q = 1. The ‘quadratic’ under the table is the covariance 
(a word which he had presumably not yet invented). This can be found directly from its definition 
as a mean product of deviations, i.e. as 

x (prob) (parent’s deviation) (offspring’s deviation) 

=p .4.%..4 Pg pet ee 
= pr? + 2pPqy + pap +4) J” + 2pqigk + Phe 
(there being a misprint in Fisher’s text). On substituting for j, using (XII), this becomes 
Jpg (iP HP) = APP 
the bracketed expression being squared and not cubed as in Fisher’s text. 
To obtain the variance of father and offspring we use the formula for /? given before. Then 


B2 = 2a?Q — 4Q(P — R) ad + 2Q(P— R)*d? 
2a%pq — 4pq(p — q) ad + 2pq(p — q)? a? 
2pq{a — (p—q) d}?. 


Now pr +2pqj+@k = 0, 
and since a—(p—q)d = }(i—k)—(p—q) {j-3 +b}, 
we find 2{a—(p—9) dy = 6k += E (pi + gh) + (pg) (i +h) 


l 
= — (ip? —kq?). 
mae 
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Thus the variance is 
fete ip? — kq?\?, 
AG ‘p* — kq?} 


which is twice the covariance obtained above. 


“* while for brother and brother we have the table 


°(p + 39)" pap + 39) p°¢* 
mg (p+ 29) py p? + 8pq + ¢*) PP(EP +49) 
pq" PL(sp +9) Psp +4)? 


which gives us a quadratic expression exceeding that for the parental correlation by the terms 


Be (2 —Qéj + 472 + Qk — Wik + k?), 


which are equal to $6", and therefore give for the fraternal correlation 


1 
gga (T+ Be)” 


To obtain the brother—brother table we consider the table given before (Table B) of all possible 
offspring of two randomly mated parents, and examine all possible fraternities. Then an 
(A, A,, A, A,) fraternity can arise out of a crossing A, A, x A, A, with probability P?, or out of a 
crossing A,A,~xA,A, with probability ‘ees +2PQ) = PQ, or ‘finally out of a crossing 
A, A, x A, A, with probability ;'5(4Q@?) = 4@°. This gives us the cell in the first row and first 
column of Table E and the others are obtained similarly. 


TaBLe 
Brother 
oo or 
Brother a a) k 
i P2+PQ+1Q PQ+3Q2 10° 
j PQ +30 PQ+2PR+Q24+QR 4Q?+QR 
k +? $Q°+QR 207+ QR + BR? 


Notice this is symmetric about the leading diagonal and symmetric about the other diagonal 
on interchanging P and &. On substituting for P, Q and R we get Fisher’s table. To save algebraic 
labour we subtract the previous table and get an array of the form: 


£p7q* —$p"¢* +p°q* 
H4p*¢" pg ep a be 
tpg" apg tpg? 


from which we immediately get the expression 
P?q*(t — 29 +k)? 
(277 and 2jk in Fisher’s result should be 427 and 4jk) and on substituting for i, 7, k in terms of 


a, d and m this becomes pqrd2 = 302. 


The brother—brother correlation is therefore exactly intermediate between parent-offspring 
correlations with and without the same degree of dominance. 
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We have set out the above argument in detail in order to show Fisher’s procedure. However, 
the simplest way of finding the above brother—brother table is to use the fact that sibs have 
probability 4 of sharing two genes in the previously used sense (and therefore of having the same 
genotype at this locus), probability } of sharing one gene, and probability } of sharing no gene. 
The above table is then found by adding the three corresponding 3 x 3 association tables. The 


same method of approach can be used in all the following tables but we follow Fisher’s method in 
order to make his discussion clear. 


“The effect of dominance is to reduce the fraternal correlation to only half the extent to which the parental 
correlation is reduced. This allows us to distinguish, as far as the accuracy of the existing figures allows, between 
the random external effects of environment and those of dominance. This halving of the effect of dominance, 
it is important to notice, is independent of the relative importance of different factors, of their different degrees 
of dominance, and of the different proportions in which their phases occur. The correlation between the 
dominance deviations of siblings is in all cases, }. 

“7, To investigate the cases of uncles and cousins we must deal with all the possible types of mating down 
to the second generation. The three Mendelian phases will yield six types of mating, and ordinary cousinships 
are therefore connected by one of six types of sibship. The especially interesting case of double cousins, in 
which two members of one sibship mate with two members of another, can occur in twenty-one distinct ways, 
since any pair of the six types of sibship may be taken. The proportionate numbers of the three Mendelian 


phases in the children produced by the random matings of such pairs of sibships is given in the accompanying 
table: 


Type of sibship ... 1.0.0 LO 0.1.0 2a Oe sl 0.0.1 
Frequency te Wee A4p*q 2p*q? 4p7q" 4nq° ¢ 
p* PeOEd da USO 1.1.0 LoLs0 oe eho 0.1.0 
4p*q Be eO OP Oe 3.4.1 3.4.1 3.10.3 0.3.1 
2p7q" es 3. 4.1 Lal 12g 1. 4.3 Veal el! 
4p?q? PoLsG 3. 4.1 1 Oe hal 1. 4.3 One 
4pq? io. SOs L 1.4.3 1.4.3 Ne etd Oeles 
q’ 0.1.0 0. 3.1 0.1.1 One Ome ONO 

3p p+3q q p liq plq p 3pt+q 34 
p.q.9 — ,—_— ,- ~,— = — ,=, —, —— 0.p.g 

4 4 + Doh ay VA AS + 4 4 


“ The lowest line gives the proportions of the phases in the whole cousinship whose connecting sibship is of 
each of the six types. 


To discuss uncle-nephew relationships and cousins we have to consider three generations 
because we must first calculate the different probabilities of various classes of sibship which can 
arise from a random mating of unrelated pairs. This is done in Table F. 


TaBLe F 
Relative frequency of sibs 

Type of Probability — A < 

mating of mating A, A, A, A, A,A, 
A,A,xA,A, pt 1 0 0 
A,A,xA,A, 4p3q 1 1 0 
A,A,xA,Ag 2p?g? 0 1 o 
A, A,x 4,4, pq? 2 : 
A,A,xA, A, 4pq° 0 1 1 
A,A,x AgAy q* 0 0 1 


We may illustrate the meaning of this table by saying that the mating A, A, x A, A, has 
probability 4p2q? of occurring and that each of its offspring has (independently) the probabilities 
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1, 4 and } of being A, A,, A, A, or A, Ay. Such a sibship is denoted aided Ww sandu-—alousyarlbol 
(Tey: iw old-* “odor f 

Fisher’s 6 x 6 table is a table giving relative frequencies of the ti. ybetic types in the 
offspring from a mating in which it is known that one parent comes fron: | |» >» 2 above specified 
sibships and one from another. (Note that the entry 3.10.1 in the 103 yy» echnd column of 
the 6 x 6 table should be 3.10.3). Thus the offspring of a mating b6 » ».) io eaniual out of a 
sibship whose parental cross was A, A, x A, Ag, and an individual froe snp scrp produced by a 
mating A, A,xA,A,, will be of types A,A,, A, A, and A, A, witha gy ppilities 4, te. ve 
le SABA , oats aqidadia xir 

To construct this table it is convenient to regard such symbols as (’ », ete., as row 
vectors. To obtain any entry in the table we premultiply the vector co uig to the column 


by the transpose of the vector corresponding to the row. Thus in th¥j2%i 0» #48e We take 


1 ee ht Bat Be 
( (0°14 1S) OT ese 
0 00 0 


Each element of the resulting 3 x 3 matrix is then mul¥inlied by ti “! )Scesponding vector i. 


following matrix, and the products summed. This ma six is , ‘(og 
4:0/0°) 292-0 “GG 
280 ea | OL 2eoe ; 
0.450 .0-2.2%) 0s0uee ae 


These give relative frequencies of offspring as derived from Table F. Thus the matrix 


ih i 
(° 1 : 
Dud Lae 
gives (2.2.0) +(0.4.0)+(1.2.1)+(0.2.2) = (3.10.3) 
which is the required result. 

The table is symmetric about the leading diagonal and has a number of other symmetries... 
If an individual from a sibship S; is mated with an individual chosen at random from ti. whe" 
population, the three types of individual will occur in the offspring with the probabilities given, \,; 
the last row. Thus if a member of a sibship of type (1.1.0) is mated in this way, the offspring will 
be A, A,, 4, A, and A, A, with probabilities 

4P, (Pp + 39), 29- 

This can be seen directly or by summing the probabilities corresponding to the elements of the 
columns of the 6x6 table after multiplying each by the probabilities of the rows, and then 
rescaling to obtain total probability equal to unity. Thus if two individuals are cousins, and 
connected by a given one of the above sibships, and are not related in any other way, each will 
belong to A, A,, A, A, or A, A, with probabilities given by the last line. ; 


‘*Tf we pick out all possible pairs of uncle (or aunt) and nephew (or niece) we obtain the table 


p*(p + $9) p'q(3p + g) $p*¢* 
p?q( 3p + q) EPY(p? + 6pq + q*) $p9"(p + 3q) 
aP*¢" 2Pq"(p + 3q) g(p +9) 


the quadratie from which reduces exactly to 4/2, showing that when mating is at random the avuncular 
correlation is exactly one half of the paternal.” 
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The wv , can be constructed from first principles by combining the previous 
brother—hi a 0a n the parent—offspring table, or it can be constructed from the above 
6x 6 table. Cons? atter method. Suppose that the uncle is the brother of the nephew’s 
father. There ar ins in which the father and uncle can occur and these are represented 
by Fisher by # "3 (1.0.0), (1.1.0), (0.1.0), (1.2.1), (0.1.1), (0.0.1) at the top of 
the six volu » table. The components of these row vectors represent relative 
frequencies and n ilities. We therefore convert them into probabilities so that we obtain 

50), (2.4.0), (OjL.0), (2.2.2), (0.4.4), (0.0.1). 
These s . with probabilities 
p*, 4p%q, 2p?q?, 4p?q?, 4pq%, 9" 
respectively ard the unding probabilities of the A, A,, A, A, and A, A, in the nephew are 
given by the last row < . table. If the vectors of the last row of the table are turned into 
column vectors (p.q.0)',.. __tc., the 3 x 3 association table will have 9 elements which are the 
elements of the 3x 3 mAtejx 
odtGn.g.0)'(1.0.0)+4¢  — -4(p+3q) "q)' (5-3-0) + 2p°q?(p.3. 39)’ (0.1.0) 
+ 4p g*(sp.d.. (3-4) +4p_ p.a(8p+q).49)' (0.2.2) + q4(0.p.9)' (0.0.1) 
p 0 9 a? 1a no 
=p|7 9 0)+2p%q| a(p+3q) a(p+39) 0 
0 0 14 49 0 
0 2p 0 2p DP 3P 
+2p°¢?(0 2 Oft+p?les 1 2 
0 3¢ 0 29 9 234 


0 bp Ip 00 0 
<0 (0 4(3p +4) Oa) +¢q* (° 0 7 
0 #4 iq LU 
and adding these we obtair the uncle—nephew table given by Fisher. Notice that this table is 
sMOmettic although the relatio. hip is not. The rows correspond to the nephew and the columns 


“he uncle. Inserting the values 7,7, k and multiplying each element of the matrix by the corre- 
sponding product of 2’s, 7’s and k’s we get a formula for the covariance which begins 
p*(p + 39)? + 2b p"q(3p +9) Y} +... 
“2 
Dees the covariance reduces to 


Substituting for 7 = — i 


1 
pu — q’k)? = 4 2 
pg (PE PRY? = BA 


Thus there is no correlation due to dominance. 


“From the twenty-one types of double cousinship pairs may be picked, the proportions of which are shown 
in the table: 


p(p+4q)? 3p°q(p + 49) iP?" 
D°q(p + 49) Epp? +48pq + 4g”) $pq°(tp + 9) 
Ter??? Bpq(tp +) g(tp+4q)* 


which agrees with the table given by Snow for ordinary first cousins. I cannot explain this divergence, unless 
it be that Snow is in error, my values for ordinary first cousins leading to less than half this value for the 


2 M&s 


20 COMMENTARY ON FISHER 


correlation. Simplifying the quadratic in 7, 7, k, which is most easily done in this case by comparison with the 
avuncular table, we find for the correlation of double cousins 


1 
402 (r?-76*), 


showing that double cousins, like brothers, show some similarity in the distribution of deviations due to 
dominance, and that with these cousins the correlation will in general be rather higher than it is for uncle 
and nephew.” 


Double cousinship is more complicated. Suppose the cousins are such that the two fathers 
come from one sibship and the two mothers from another. There are therefore 36 possibilities of 
which it is only necessary to consider 21 by symmetry. In the 6 x 6 table the individual entries 
are 3 element vectors whose components are proportional to the frequencies of A, A,, A, A, and 
A, A, in the progeny of a mating between individuals chosen from these sibships. Double cousins 
are the results of independent choice of pairs from the same sibships in this way. We can therefore 
construct a 6 x 6 table (‘Table G) in each cell of which we have first the probability that the two 


TABLE G 
ps 4p"q 2p*q? 4p°q° 4p°q° pq" 
Te nee x) anal he 0) 1) eo ib We oar Tepe?" 0 0 070 
0 0 Syke O i Sao el, 0 Sa 9 0 Oe shy ath) 
OMOm0 Oe ote 0 Ww. 0 ce Xe @ 0 050 
1 6p%q? 8p>q 16p°q° 1 6p*q* 4p8q° 
81 54 9 Q: 123 Gy 12 aes 9 30 9 G -0m0 
54 36 6 12 16 4 12 16 4 30 100 30 Oo aes 
Ne 5, Aad SC) 4eIET 3 foe 9 30 9 Ons] 
4p4q* 8p*tq* 8p%q® 2nq° 
Lot el ee Oy Tat ee dive eS 0: 026 
2A Ded ae AgnlGeal2 Ossie 
ey «al i eel By py 0. Cie 
16p4q* 16p%q° 4n7q° 
Ui eh Deal. ee 10°83 OOO 
eee 4 16502 Ometuare 
Abe Oe a S229 (Tyeciiey ik 
16p*q° 4pq" 
1G FS 0110 30 
6 36 54 Oo Lowes 
9 54 81 0) Sao 
g 
Oia OE C0 
en Oe) 
Om Onn 


corresponding sibships have been chosen and then a 3 x 3 matrix whose elements are proportional 
to the probabilities of the three genetic phases in the two double cousins. It is more convenient 
to enter the matrix with numbers which are only proportional to the probabilities and not equal 
to them, as in this way we avoid the use of fractions. To obtain the probabilities it is necessary to 
divide each element by the sum of all the elements in the matrix. 
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Thus if one of the connecting sibships corresponds to the symbol (0.1.0) and the other to 
(3.1.0), the vector given in Fisher’s 6 x 6 table is (3.4.1), and the contribution to the covariance 
table will have elements proportional] to 


Gaee 3 
(3.401 (ge4el) eb lone, 4], 


and since the corresponding probability is 

(4p°q) x (2p°q") = 8p°q°, 
the sum of the elements of the matrix is 64, and there is another equal contribution from the 
matrix situated symmetrically on the other side of the main diagonal, the contribution to the 


covariance table is 9 12 3 
Lpeg2(12 16 4), 
3 ae | 


The empty cells are obtained by symmetry. Multiplying by the probabilities, the reciprocal of 
the sum of the elements of each matrix, and adding, we check Fisher’s table for double cousins. 
The difference of this table from the uncle—-nephew table is 


ieP'?¢* —3P°¢" teP"¢* 

— 3p?" tP*¢" gs 

Tp?’ —4p'¢" tera? 
which gives a term zep7g?(t — 25 +k)? = 350?, 


and the correlation of double cousins is therefore 


1 
4o2 (r3 + te?), 


Notice that the double cousin table is necessarily symmetric. 


‘“‘ For ordinary first cousins I find the following table of the distribution of random pairs drawn from the 
six types of ordinary cousinship: 


+p*(4p + q) tp’q?(7p +g) pg" 
+p*q(7p +4) £pq(p* + 14pq + g*) 4p9q°(p + 79) 
pg" tpq>(p + 79) 4q3(p + 4q) 
72 ” 
which yields the correlation = —. 
8 o 


Ordinary first cousins are connected by a single sibship. They are therefore each the result of 
the mating of one of the sibships in Fisher’s 6 x 6 table with a mate chosen at random. We can 
therefore divide all first cousins into 6 classes according to the type of connecting sibship and the 
covariance table is the sum of 6 tables. Each of the latter is obtained by multiplying the 
probabilities of the connecting sibship by the matrix obtained by the column into row product of 
the last row of Fisher’s 6 x 6 table by itself. Thus for example the first of these is (p.q.0) with 
probability p*, and its contribution is 


p pg 0 
p(p.q.0)' (p.¢.0) = pti pa @ OF}. 
One CeO 
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The sum of all these is 


p= pq 0 16P" ie(3p" + 9pq) TePY 
Pipq GF 0)+4p8¢| ve(3p?+ 979g) Ze(p +39)? e(PG+ 39”) 
0 Uw T6PY 7e(Pd + 39”) Te 
TD at ees ie?" pe(3p? + pq) T6PI 0 90 9 
+6p'@tip +t 4¢ |+4pqe(qe(3p?+p9) ee(38pt+q)? e(9pY+3q")]+q4(9 wv pg 
ON ee ek TePd ie(9pq + 39”) tet" 0 7a 


Adding these we obtain Fisher’s cousin table which is checked except for the entry in the first 


row and second column which should be 


£p"q(7p + 4). 
This table is necessarily symmetric. 
Calculating the covariance and subtracting 


(p* + 2pqj + g°k)* = 0, 
f 1 
ewes LPUP! (PB 2) oe aad mae Mami ae 
so that for single cousins there is no dominance component in the correlation. 


‘In a similar way the more distant kin may be investigated, but since for them reliable data have not yet 
been published, the table already given of genetic correlations will be a sufficient guide. 

“8. Before extending the above results to the more difficult conditions of assortative mating, it is desirable 
to show how our methods may be developed so as to include the statistical feature to which we have applied 
the term Epistacy. The combination of two Mendelian factors gives rise to nine distinct phases, and there is 
no biological reason for supposing that nine such distinct measurements should be exactly represented by the 
nine deviations formed by adding 7,7, or k to 2’, 7’, or k’. If we suppose that 7,7, k, 7’, 7’, k’ have been so chosen 
as to represent the nine actual types with the least square error, we have now to deal with additional quantities, 


which we may term 
@i11 S12 15 
291 22 a3, 
C31 &32 33> 


connected by the six equations, five of which are independent, 
Pp? ey, + 2pg en, +47 es, = O, Prey +2p'Y eye+ 77 e13 = 0, 
P? eyo + 2G en9 + G7 ego = 0, Peg + 2p’ Cog + Y'2 C93 = O, 
pes + 2pq eeg + G7 e33 = 0, pes, + 2p'q’ €gn +4 €33 = 0.” 


The definitions of 7,j,k are now modified. Suppose that we have two non-linked loci at which 
the genotypes are A, A,, A, A,, A, A, and B, B,, B, B,, B, By. Let their effect in combination be 
4,,... So we have Table H. 


TABLE H 
Bb; B, By Bs B,B, 
a i) k’ 
A,A, 1 ay Aye As 
A,A, j Ae) Ao As 
A,A, k 31 30 33 


We assume random mating and put p, p’, for the frequencies of A, and B, respectively. We also 
assume that random mating has been occurring in the population for a sufficiently long time for 
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the frequencies of the genotypes A, A, B,B,, A, A,B, B,, etc., to have attained their limiting 
values p2p’?, (2pq) p’2, ete. 

The values ¢, j, k, i’, j’, k’, are now chosen so the sum of the corresponding values for the two 
loci represent @4;, @4., ... as closely as possible in the sense of least squares. Write 


Qyy = yt", C93 = Agg—J—k’, 
Cg = Ayg—t—j, €31 = Agy—k—1’, 
C3 = Ayg—t—k’, C32 = Ugg—k—J’, 
C21 = Gq —j—t, €33 = Asg—k—k'. 


res 
C22 = Ang—-J —-J > 
Then we want to minimize the sum 


Sy = peg ei + 2p*p'g'eja + p'q'ets + 2pgp’*e3, + 4pqp'g' ed, + 2pqq'*e35 
+ q?p'e3) + 2q*p'q' es + Gq *e5s- 
Differentiating S, with respect to i,j,k and 7’,7’,k’ we get the six equations given above by 
Fisher. This process is exactly analogous to estimating row and column effects in an experiment 
in which rows and columns are orthogonal, the orthogonality being here a consequence of 
independent distribution of the two factors. Since the addition of a constant to all the a’s makes 


no difference, we can choose the latter so that the mean of 7,7, k and the mean of the i’, 7’, k’ are 
both zero. This means that 


pi+payt+@k=0, p'i’+2p'q'j’ +k’ =0. (XII a) 


Of the six equations only five can be independent in general for if we multiply the first three 
by p’*, 2p’q’, q’®, respectively and add we get the same result as multiplying the second three by 
p*, 2pq, q*, and adding. Thus only 4 = 9 — 5 of the e’s can be varied and the epistatic and dominance 
relations arising from two different factors require four constants for their definition. 


“This is a complete representation of any such deviations from linearity as may exist between two factors. 
Such dual epistacy, as we may term it, is the only kind of which we shall treat. More complex connections 
could doubtless exist, but the number of unknowns introduced by dual epistacy alone, four, is more than can 
be determined by existing data. In addition it is very improbable that any statistical effect, of a nature other 
than that which we are considering, is actually produced by more complex somatic connections. 

The full association table between two relatives, when we are considering two distinct Mendelian factors, 
consists of eighty-one cells, and the quadratic expression to which it leads now involves the nine epistatic 
deviations. A remarkable simplification is, however, possible, since each quantity, such as e,,, which refers 
to a partially or wholly heterozygous individual, is related to two other quantities, such as e,, and é3,, by 
just the same equation as that by which] is related to 7 and k, and occurs in the 9 x 9 table with corresponding 
coefficients. The elimination of the five deviations 91, €15, €32; €235 €22 18 therefore effected by rewriting the 9 x 9 
table as a 4 x 4 table, derived from the quadratic in 7 and k corresponding to the relationship considered.” 


By the method of definition we can split the contribution to the total character due to the genes 
at the A and B loci into seven components, 


u" 
L = Uy +%qQ+HXe+%, +X +x, +4, 


where 2, is the effect due to the ‘first’ gene at the A locus (e.g. that inherited from father), x, the 
effect due to the second gene at the first locus (e.g. that from mother), and x; is the deviation from 
linearity due to dominance. Thus 7, +%,+%3 = 1,j, k according as the genotype is A, A,, A, Ag, or 
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A, Ag. x}, x5, 2; are the corresponding components at the second locus, and a4 is the ‘epistacy’ 
deviation. 

We have already seen that 2,,%,%3, are uncorrelated in pairs, and the same holds good for 
2X4, £3, £3. The latter are also statistically independent of x,, x, x, because they are at an unlinked 
locus. 

It also follows from the set of equations such as 

ply, + 2pGeor + Ges, = 9, 
that for any fixed genotype, such as B, B,, at the second locus, the mean value of #4(= é,) is zero. 
Thus aq is uncorrelated with the effects x}, x3, 23, at the second locus. A similar argument holds 
for the first locus. This could also have been seen from Least Square Regression Theory. 

The seven components above are therefore uncorrelated between themselves, and the variance 
of x decomposes into seven orthogonal components, 


var 2% = L var (x,) +X var (x}) + var (x4). 
We can similarly write the value, X, of the character for a relative as 
| Kok, Pa ae ee 
where the X, have been numbered in such a way that any genes shared by the two relatives will 
affect only x, and X, (or x); and X’) with the same suffix. Then because of this numbering and the 
Previous results, cov (x, X) = Xcov (a,, X,) +X cov (x, X}) + cov (xq, X4). 
The only terms that therefore remain to be considered are var (24), var (X4), and cov (4, X4). 
var (#4) can be written 
p{ prety + 2pged + Ges} + 2p'g {peje + 2qede + Peso} + TPs + 2pgeds + Gress} 
= p"?A+2p'q'B+qC, say. 
We can treat each of the quadratic forms A, B, C in the same way. Consider the first. This is a 
quadratic form in €,,, 21, €s; Whose matrix is 


pe OM 0 
0 wq Of, 
LU DP aps 


where the rows correspond to €,,, @;, 3, and the columns also to ¢,;, €:;, €3;. We turn this into 
a quadratic form in e,,, és, only by using the relation 


—_—_ (— p%e,, — q%es1), 
Sng ue q°e31 


which is one of the equations derived by least squares. The quadratic form A then becomes 


Cy = 


prey + Gesy 


p 2 (v® p*\ 1 ( | 2 
+ 7e2, = { p? +=] e3, + pger es: + (G2 +5 | C2 
2nq q°e34 Pp | 11 + PGe11 &31 + 1 2p 31 


prety + 2| 
2 
(p + 2q) +P 
which has the 2 x 2 matrix JP , 
q 
al ee 
>Pq op (2p +q) 
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where the rows correspond to ¢,,, és, respectively, and the columns similarly. Exactly the same 
relationships apply to B and C resulting in the same 2 x 2 matrix. We can therefore write 


. ” Pp? , Sef , , pe? t 
var (%4) = ( Le x) {p'*e3, + 2p'q' ets + g'2e25} + at p C13 C51 + 2p'G' C10 ego + 7'*C13 sa} 
fbA We ae ‘ 
c (7: + £) {p'e3, + 2p’q'ea + g'2e3,} 


242") + pbs (g+©) PF 
=( +7) + pq. +(a +£) , Say. 


We now apply the same procedure to D, FE, F. D becomes 


de ae) eA une as es 
Pp +99? Cy +P 7 ey C13 + 1d yy C13) 


, p* pap rar ; ga 
E becomes (» : ad €41€31 + BP C1 ng + BP 7 C13 31 + (1 ; +z) €143 335 
and F becomes ee 2 ‘gq! 12, P*) 02 
p 2q/ C31 +P F 31 a3 + (9 tog C33: 


We have therefore reduced the quadratic form in the nine variables ¢,,,...,é33 to one in four 
variables ¢€,,, €13, €31, €33- This is 


Vg / a | am , . 9 
(2°+F) {(2 +E eh +p g 43 €43 + (1 +2) esl 


3 


/ Pp ters fees 2 "9 i 
+pal(p 47) €41 Cg + EP 'F C41 C33 + 3 Y C13 C31 + (0 175 C13 es 


242 \(( 242) 2 a pig’ 247") 02 
az (1 +£) I(v BP r) C31 +P 7 C31 C33 + (2 a - e} 
On taking out a factor 1/4pqp’q' this agrees with Fisher’s result. 
The method of deriving the above can be described in algebraic form. Given the matrix 7) 
above and the corresponding matrix 7, corresponding to p, g, we form the direct product, which 
is of order 4, and is obtained by replacing each element of 7, by the product of this element 


considered as a scalar with the matrix 7}. 
The second expression below for this quadratic is verified by expanding all the above terms 


and subtracting the terms obtained from the expansion of 
(p>p'ey1 — P°q'e13 — "Pea: + 7 q'"Cg3)”, 


whence we obtain four similar expressions of which a typical one is 


2pqp'*q' (pers + 9s)” + 2pgp"(pers + Ges)" = 2pgp? (per + Yea1)”: 
because p’+q’ = 1. Thus the four similar terms in Fisher’s second expression below are correct 
although they are only of the seventh degree in p, q, p’, q’, and not of the eighth as in the original 
quadratic form. 


“Thus the variance, found by squaring the individual variations, is derived from the 3 x 3 table 


p — oe 


— —s he 
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2 
which yields the 2 x 2 table _ (p + 2q) SPY 
qd ? 
tpy go (2p +4) 
‘Pp 


and the quadratic in @,;, €13, 31, €33 


1 
ana [(p + 2q) (p’ + 2q’) pp’? e7, + 3 similar terms + 2p?q?p’3(p’ + 2q’) e1, €3, + 3 similar terms 
+ 2p?q?p’?q’?(e11 &33 + C13 &s1) 15 
which also takes the form 


1 Seri. 
4pqp’¢’ [(p*p’e1, — pq’*e13 — g*p"eg; + 9°q*eg3)* + 2pgp >( pe; + Ges1)* + 3 similar terms]. 
The parental table p3/4q —1pq 
— £29 gq’ /4p 
. 1 , of , , 
yields Tepqp'y 2? 7e11 — p?q’*e13 — Q*pes1 + 9°q'*€55]*. 


The parent-offspring table has 9 x 9 = 81 cells. In order to reduce it we use the same kind of 
transformations as in discussing the variance. Consider terms of the form 


Coy & 


rm 
where m, n are fixed and r, s = 1,2,3. From the parent—offspring table (Table C above) with 
P= p*,Q = pq, Rk = ¢* we obtain the following similar table (Table I), using the fact that the two 


loci segregate independently. 


sn? 


TABLE I 
Parent 
| 
Offspring Cin Con €3n 
€1m p pa 0 
Com Pad Pq Pe? 
am 0 PP g 
We now use the formulae: 
I 2 2 1 2 2 
Com = <p Cimt+ €3m)> Con = a Cin +9 sn). 


2pq 2p4q 
We turn the quadratic form above (which has 6 variables ifm + n and 3ifm = n) into a quadratic 
form with 4 variables if m + n and 2 variables if m = n. We then get the array given in Table J, 


TABLE J 
Cin Con €3n 
€1m p®/4q 0 — tpg 
€am 0 0 0 
€3m =, £PY 0 q/ 4p 


which we can regard as a 2x2 table. Using this and the similar table with p’, q’, the direct 
product of the two matrices gives an array whose elements are the elements of Table K multiplied 
1 
by ———... The corresponding quadratic form is obviousl 
Nf l6pqp'g P gq o 


1 , , / / 
16pqp'q’ {pp 7611 — P'q'"e13 — G?p'*es) + 9° esa}. 
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TABLE K 

C11 C31 €13 €33 
11 big ioe 7g" * ss pt Agi 8 prin *g4 
31 — pgp” qp’* p*q?p*q’ Baie te 
C13 —p*p'*g® Pepq? ptg’4 = pagig® 
C33 P?ep’q? Sp ei ri ef Be g'¢’* 

** and the fraternal table F 
p?/4q = 
Bae q?/4p 


leads us to the simple expression 


1 
16pqp’¢ , rp Rasa an er, +p°q’%e?, + q ye a gq’*e, ].” 
Applying the same argument to the brother—brother table (Table E) and eliminating e,,, and 


€, from the corresponding quadratic form we get the array of Table L, which combined with the 
similar result for p’, q’ gives 


one AP*p'eh, + P9q'°eis + g°p'%e5, + 97q'%ez3}- 
TABLE L 
Cin esn 
€1m p?/4q 0 
e3m 0 q?/4p 


“For uncles and cousins we obtain respectively } and -; 


« of the parental contribution, while for double 
cousins the table 


and a quadratic similar to that for the variance.” 


The same technique is applied to the uncle-nephew table to give Table M. 


TaBLeE M 
C1n €3n 
Cm p®/8q — $Pq 
3m —+$pq q*/8p 


Since this is one-half of the corresponding table for parent—offspring, the epistatic contribution 
to the covariance between uncle and nephew is + 


+ that of the epistatic component in parent- 
offspring covariance. 


Cousins and double cousins are then easily treated in the same way. 


“9. With assortative mating all these coefficients will be modified. There will be association between 
similar phases of different factors, so that they cannot be treated separately. There will also be an increase in 
the variance. 

‘* We must determine the nature of the association between different factors, and ascertain how it is related 
to the degree of assortative mating necessary to maintain it. Then we shall be able to investigate the statistical 
effects of this association on the variance of the population and on the correlations. 
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“Tf w be the marital correlation, then in a population with variance V the frequency of individuals in the 
range dz is 1 
—____¢-“'V dy = M, 
Janvy) 


1 A 
and the frequency in the range dy is (anv) eV IV dy = N; 
but the frequency of matings between these two groups is not simply MN, as would be the case if there were 
no marital correlation, but 


1 2? —2uay+y? 
ef Pd ae da dy, 
avin | l-~ 2 es 
MN 
which is equal to ———— exp - 
V-#) 

** In studying the effect of assortative mating we shall require to know the frequency of matings between two 
groups, each with a variance nearly equal to that of the whole population, but centred about means a and b. 
The frequencies of such groups in any ranges dx, dy can be written down, and if the chance of any mating 
depends only on « and y, the frequency of mating between these two groups can be expressed as a double 
integral. If M and N are the frequencies in the two groups, the frequency of mating between them is found to be 


MN epavlv.” 


Pac? — Qpay + py? 
2V(1—p2) 


The idea in the above section is that non-randomness in mating is due to a tendency for the 
biometric measurements in the two mating individuals to be correlated. Suppose that these two 
measurements are x and y, and that since they are the result of a large number of independently 
segregating factors they can be supposed to be normally distributed. It is assumed that there is 
no epistasis. We take their means as zero and their variances as V. The probabilities that they lie 
respectively in ranges (x, « +dz) and (y, y+ dy) are taken as M and N, and it is assumed that their 
joint probability distribution is given by the bivariate normal distribution above with yw as 
correlation coefficient. 


The expression pa? — Qpocy + py? 


2V(1—p?) 


can be regarded as a weighting factor giving the relative probability of a mating between two 


(1—2)-4exp 


particular individuals which are known to have the measurements a and y. 

Now suppose that these two individuals are chosen at random out of normal populations which 
are known to have the means a and 6 respectively and variances equal to V. Their relative 
probability of pe is given by 


— h)2 22 2,,2 
ai |e »{-"S ay? (y—6)? we ase 


2nV(1 2V a i — 
~ onV(1 aan) es Plan i=) 19 
where W = (w—m,)?— 2n(a—m,) (y—Mg) + (Y—mM,)? + K, 
where Mm =a+pb,m,=b+pya and K = —2pab(1—p*). 
Integrating out, the expression becomes 
exp ae as 


as required. Notice that this result is exact so long as the bivariate distribution is truly normal. 
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When y + 0 the non-randomness of the mating has two effects: (1) The Hardy—Weinberg 
equilibrium for each individual locus is destroyed. (2) The zygotic frequencies for different 
factors are no longer independent. Hence the average values of individuals of the form A, A,, 
A, A, and A, A, cannot be taken as i, j and k, but depend also on other loci. It is this which 
introduces the essential complication and requires the introduction of the condition that the 
population is stationary. 

Fisher now investigates the effect of assortative mating on the genotype frequencies, using the 
condition that these frequencies are the same in the offspring generation as in the parent genera- 
tion. This implies that the probability of being a parent is independent of genotype so that there 
are no selective differences. It is possible to devise schemes of assortative mating in which, for 
example, the extreme types are less likely to find suitable mates. In such a case the distribution 
amongst the offspring of all the matings would be that of the population as a whole but not the 
same as that amongst ‘parents’. 

We first consider the effect on the frequencies of the three phases of a single factor. Write 
D, H and R for A, A,, A, A, and A, A,. Consider the effects of the various types of mating listed 
below: 


Mating Offspring Mating Offspring 
DxD D AS 4D+4H+4R 
DIE $D+4H ET ere xH+4+3h 
Dxk H RxR R 


The first two and last two of these matings will, in an indefinitely large population, produce no 
change in zygotic frequency since the relative proportions of the phases in the offspring are the 
same as those of the parents. 

Out ofall possible matings let the frequencies of mating D x Rand H x H bef, and f, respectively. 
Then the contribution of these matings to the next generation will be such that D, H, R are in the 
aude: Me Siti Me 


whilst the proportion of D, H and R amongst the mates entering into these matings is 4, 4, }. 
If these ratios are to be the same we must have f, = 2f,. Let J,.J, K be the means of the character 
in the individuals which are D, H and R. Since there are supposed to be many loci contributing 
to the character, the contribution of any one locus to the whole character is small, so that 


I/V3, J/Vv3, KIV* 


are all small and the variance of the character for a given phase at this locus is practically V. 
Hence the frequencies of these matings are proportional to 


e4KIV and et?V, 
by the above theory, and [K/V, J?/K are quantities of the second order of smallness. Hence to 
a high degree of approximation 4Q2ehT'V = 4PR eK, 
Expanding the exponentials and neglecting squares of J?/V and [K/V, we get 
PR-Q? = (u/V){Q2J?— PRIR} = n(Q?/V){J?—IK}, 
on observing that PR — Q? is of the second order of smallness. Note that this is an approximation. 


We now put P=p?+06, Q=pqto, R= q?+s. 
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Then 0, + 26,+63 = 0. The gene frequency of A, must be 
p=P+Q= p?+pgto, +6, = p+d, +03 
and hence 6, +6, = 0. Similarly, 6,+63 = 0. Hence 
0, = —0,—0,= 0, BBY. 
If (XIII) is to be satisfied with (J?— IK) V- small, 6 must be small also, and substituting we get 
(p? +d) (q? + 6) — (pq— 0)? = (pq—6)? u( J?—-TK)IV, 
and to a first approximation 6 = p*q?u(J?-IK)|/V, 


so that (XIV) follows. 


The deviation in P, Q, R, from the values they would have if the Hardy—Weinberg equation 
held, are of the second order of smallness when 


I/V3, J/Vv3, K/V3 


are regarded as being of the first order of smallness. Equation (XV) follows from the definition of 


I, J, K and we can use this to eliminate J. The first approximation to J is got by putting p?, pq, 
q* for P, Q, R so that 


ti 
J =—~—(p?I +q@?K), 
pag 2 qk) 


and putting this in the expression for a we get 


PPL + PK) — pylK} = FF (pl — PK). 


“10. We shall apply this expression first to determine the equilibrium value of the frequencies of the three 
phases of a single factor. Of the six types of mating which are possible, all save two yield offspring of the same 
genetic phase as their parents. With the inbreeding of the pure forms D x D and R x & obviously no change is 
made, and the same is true of the crosses D x H and R x H, for each of these yields the pure form and the 
heterozygote in equal numbers. On the other hand, in the cross D x R we have a dominant and a recessive 
replaced in the next generation by two heterozygotes, while in the cross H x H half of the offspring return to 
the homozygous condition. For equilibrium the second type of mating must be twice as frequent as the first, 
and, if I, J, and K are the means of the distributions of the three phases, 


4Q2 eV = 4PRelKkv, 
“Since J?/V and [i/V are small quantities, we shall neglect their squares, and obtain the equation 


eau gre 
PRE Oia en a (XIII) 


If, as before, the two types of gamete are in the ratio p:q, the frequencies of the three phases are expressed 
by the equations 


Jt-IK 
P=p +p yu a 
V 
Jt-1K 
Q=pq-P'ee— 7 (XIV) 
J2-1K 


R=P+pqp pa 
“Tt is evident that 
PI+2QJ+RK = 0, (XV) 


and this enables us, whenever necessary, to eliminate J, and to treat only J and K as unknowns. These can 
only be found when the system of association between different factors has been ascertained. It will be 
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observed that the changes produced in P, Q, and R are small quantities of the second order: in transforming 
the quantit 
opin Baia Li 
1: 9 arg TS 


we may write —(p?J+q?K) for 2pqJ, leading to the form 
E ip? 
I —q?K)?, 
aye as) 


which will be found more useful than the other. 

“11. The nine possible combinations of two factors will not now occur in the simple proportions PP’, 2PQ’, 
etc., as is the case when there is no association: but whatever the nature of the association may be, we shall 
represent it by introducing new quantities, which by analogy we may expect to be small of the second order, 
defined so that the frequency of the type 

DD’ is PP’(1+f11), 


that of DH’ is 2PQ’(1+/f;2), 
and that of DR’ is PR’(1+/f,3), 
and so on.” 


We now have to study the effect of assortative mating on the joint distribution of pairs of 
factors since such pairs are not now distributed independently of each other. 

Write D, H, R for the phases of one factor with frequencies P, @, R, and D’, H’, R’ and P’, Q’, 
R’ for the second factor. The joint frequencies can then be expressed by introducing new quantities 
Sit: --+».fg3 in the manner shown in Table N. 


TABLE N 
2nd factor 
—_— aa ]@M@""—_'- 
lst factor 1D EKG ‘Re 
pe P’ OZ Bee 
TOY qe PP’(1+fi1) 2PQ’(1+/f12) PR’(1+fis) 
H Q 2QP’(1 +fa1) 4QQ'(1+/fo2) 2QR’(1 +fes) 
kR RP"(1+fs1) 2RQ’(1+fs2) RR(1+fs3) 


(Notice that R and R’ are used in two different senses.) Since the sums of the rows and the 
columns must equal the corresponding row and column frequencies we get (XVI). Since the 
first three of these equations when multiplied by P, @, R and added are equal to the second three 
multiplied by P’, Q’, R’ and added, only five of these equations are independent, and so four of the 
f’s are independent. We take these as f;, f13, fg, and fs. 


‘Formally, we have introduced nine such new unknowns for each pair of factors, but since, for instance, 
the sum of the above three quantities must be P, we have the six equations 
Pht 2Q fist Bhs =9, Pfuit20faut+ Risa = 9, 
P'foy + 2Q'fos t+ B’fos = 9, Phiz +20 feat Rfse = 9, (XVI) 
P’fg, + 2Q'fan+ B’fs3 = 9, Pfis3 + 2Qfo3+Ffss = 0, 
five of which are independent. The unknowns are thus reduced to four, and we shall use f1, f13, fs1, ss» Since 
any involving a 2 in the suffix can easily be eliminated. 
OT hsdedemta T= i+ DP fun +209’ BM) 
J = G+ X(PW' fo, + 2Q7'foa t+ RK 'fos), (XVIT) 
K = k+X(P%f51 + 209’ foa+R'h'fs3) 
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in which the summation is extended over all the factors except that one to which 7, 7, k refer. Since we are 
assuming the factors to be very numerous, after substituting their values for the f’s we may without error 
extend the summation over all the factors. The variance defined as the mean square deviation may be evaluated 


in t f the f’ 

marae V = X(Pi2-+ 2Qj2+ Rk) + 25{PP’(1+f,,) ii’ +8 other terms}, 

which reduces to X( Pi? + 2Q72 + Rk?) + 2U{PP’f,, + 8 other terms}, 

so that V = X&(Pil+ 2Q7J + RKK).” (XVIII) 


We are assuming no epistasis, but the non-randomness of mating makes the average value of 
individuals which are D, H and FR for some particular locus not equal to 7,7, k, which are the values 
they would have if the genes at the other loci were fixed. Thus if there are just two loci the average 
value of individuals which are D for the first locus is got by averaging the deviations of DD’, 
DH', DR’, and so is 


PU(G+0) PP'(1 + fir) + +7’) 2PQ"(1 + fis) + i+’) PR'(1 +fi3)} 
= PNi(PP"(1 +f) + 2PQ"(1 + fiz) + PR'(1 +f) + PP! + 29'Q' + VR’) 
+PWP'fy + 29'Ofia + kh’ R’f,3)}- 
By (XVI), the definition of 7’, 7’, k’, and of f,,, fis, fis this is equal to 7+ (0’P’f,, + 27’'Q’ fio +k’ R' fis), 
and summing over all other factors we obtain (XVII). Notice that we then have 
PI+2QJ+RK = 0. 


Suppose that the biometric measurement can be written as the sum, 1X, of a large number of 

factors. By definition the mean value of each X, is zero and the variance is 
E(2X,)? = DEX?+ ¥ H(X;X,), 
ij 

and inserting the above values we get 

X( Pi? + 203? + Rk?) + 22{PP'(1 + fy) tt’ + 2Q'P(1 + fio) iy’ + PR'(1 +fis) th’ + 2QP"(1 + for) jo’ 

+4QQ'(1 + foo) jj’ + 2QR'(1 + fos) jh! + RP'(1 + far) kt’ + 2RQ"(1 + fgg) ky’ + RR'(1 + fgg) kh’}. 

The second summation is taken over all distinct pairs of factors. The terms within the second 

summation not involving f’s add to zero, and using (X VII) we obtain (XVIII). 


‘© 12. We can only advance beyond these purely formal relations to an actual evaluation of our unknowns 
by considering the equilibrium of the different phase combinations. There are forty-five possible matings of 
the nine types, but since we need only consider the equilibrium of the four homozygous conditions, we need 
only pick out the terms, ten in each case, which give rise to them. The method will be exactly the same as we 
used for a single factor. Thus the matings DD’ x DD’ have the frequency 


PP’.PP’.(1+ fi) (1+fi1) exp {e(l +-1’')?/V}, 
which for our purpose is equal to P?P(1 +4 2f,,4+(4/V) (1+1’)?). 


The number of possible pairs of phases is 9 + $(9) (8) = 45, but we only need to consider the four 
homozygous types. Then a mating of type DD’ x DD’ will have a relative frequency 


{PP'(1+fi)P exp (u(I +1')?/V} 
which is approximately (PP’)? {1+ 2f,, + (u/V) (L4+ 1’). 


We consider all the matings which give rise to the four homozygous types and it is sufficient, 
by symmetry, to consider the terms which give rise to DD’. For single factors the only matings 
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which give D are Dx D, Dx H and H x H. Thus the ten relevant matings with their relative 
frequencies and the proportion of DD’ in their offspring are given in Table O. 


TABLE O 
Probability 
Mating Frequency of DD’ 
DD’ x DD’ (PP’)? {14 2f1, + (“/V) (L+1’)} 1 
DD’ x DH’ 2P*P'O' + futhiet (M/V)(T+1’) (I+J’)} 2 
DH’ x DH’ 4P?Q"*{1 + 2fio+(u/V) (L+J’)*} } 
DD’ x HD’ 2PQOP™1 + fit fort (H/V) (L+1’) (J +1} 2 
DD’ x HH’ 4PQP'Q'1 + fir thee + (M/V) (L+1) (J +J)} t 
DH x HH’ 8PQQ™{1 + fis tSos + (M/V)(L+J’) (J +J")} 8 
LED XCEL" 4Q?P{1 + 2fo, + (u/V) (J +J1’)?} 3 
HD’ x HH’ 8Q?P'Q"1 + far thon + (H/V) (J +1’) (J +I’) 8 
HH’ x HH’ 16Q?Q"{1 + 2foe+ (M/V) (J +J")*} 16 
DH’ x HD’ 4PQP'Q'1 + fiat fort (M/V) (L+d’) (J +1’) t 


In cases where the pairs of mating individuals are different the above frequencies must be 
multiplied by two. Adding all together we obtain the left-hand side of equation (XIX). The fact 
that these together equal the right-hand side expresses the condition that the frequency of DD’ 
does not change from generation to generation. 


“* Collecting now all the matings which yield DD’, we have for equilibrium 


PPP 1 + Aft (e/V)L+1')?\ + 2P?P'OT1 +fithiet (M/V)(L+1’) (+d) 
+2PQP"(1 + fit fort (M/V)(L4+L) (J +1) 4+ 2PQOP'OT1 + fir t feat (M/V) (L+1’) (J+J) 
+2PQP'Q'[1 +fistfort (H/V) L+I) (J +1) + P?Q7[1 + 2fie + (M/V) (L+JI")*] 
+Q2P?[1 + 2for + (M/V)(J +1’/)71 + 2PQQ[1 + fire tSeo2+ (H/V) (L+I’) (J +J’)] 
+ 2Q°P’Q[1 + for t fest (M/V) (J +1’) (J +4) + O77] + 2fea + (H/V) (J +5’)? 


= PP"(1+fu) (XIX) 
** Now since 


(P+Q)?(P’+Q’)?—PP’(P +2Q4+ R) (P’ + 2Q'+ R’) = (Q?— PR) P’ +(Q?-—P’R’) P+ (Q?— PR) (Q’”—P’R’) 


the terms involving only P and Q, reduce (XIII) to the second order of small quantities.”’ 


Consider all the terms on the left-hand side of (XIX) which do not involve f’s or ~. These sum to 
(P+Q)?(P’+Q')?. 

The equation immediately following (XIX) is an algebraic identity. If quantities such as [V-? 

are regarded as being of the first order of smallness, Q? — PR, and Q’? — P’R’ are of the second order 


of smallness and we can neglect (Q?— PR) (Q’*—P’R’). Hence the difference between the sums of 
terms on the left- and right-hand sides not involving /’s or jy is equal to 


(Q?— PR) P’ + (Q?—P'R') P = — P’'Q?(u/V) (J? -LK) — PQ(u/ V) (J? - 1K’) 
by using (XIII), the error being of the fourth order. There is a misprint in the paper, it being (XIX) 


which is reduced and not (XIII). Fisher probably means ‘by (XIII)’. 
Next we pick out of (XIX) the terms involving ~ and these sum identically to 


(u/V) (P'+Q') (PIL+ QI) + (P+Q) (PT + Q'S ')P. 


34 COMMENTARY ON FISHER 


From this we eliminate J and J’ by using the identities 
PI+2QJ+RK=0, P'l'+2Q'J'+R'K’ = 0. 
We then obtain 
(u/4V) {(P’ +Q’) (PI— RK) + (P +Q) (P'l’ —R'K’)}? = (u/4V){p'(PI— RK) + (PT — B’K')}, 
on writing p’ = P’+Q’, p= P+Q. Expanding the square and subtracting the previously 
obtained term, we get (u/2V) pp'(PI — RK) (P'I' — B'K’). 
Next consider the terms on the left-hand side involving f’s. Adding, and using p, p’, we get 
2PP'pp'fy + 2PQ'pp fis + 2QP'pp'fas + 200 PP fos. 
We get rid of the suffix 2 by using 


2Q'fie = —Pfu-F'fis, 
20fe = —Pfu—Ffs, 
40Q'foo = PP'fit+ PR fist BP's, + RR fs, 

and we obtain spp {PP'f,, -—PR'f,3—P’ Rfs, + RR’f33}- 
Adding this to the term in and equating to the right-hand side we obtain (XIX a). Writing down 
the three other equations, and adding and subtracting we get (XX) on using p+q = p’+q’ = 1. 
Substituting back in (XI Xa), and putting P = p?, P’ = p? which we can do to the degree of 
approximation to which we are working, we get the four equations (X XI) which give the f’s 
explicitly. 


“= (m/V) [PQ J? — LK) + PQS? —I’K’)] = — (u/4V) [p*(IP — KR)? + p°(I’P’ — K’R’)?). 
Also collecting the terms in J and J, we find 
(u/V) (2 +Q’) (IP + JQ) +(P+Q) (UP +J'Q)P, 

which yields on eliminating J, (2/4V) [p’ (IP — KR) + p(I’P’ — K’R’)/?, 
while the result of collecting and transforming the terms in f is 

spp’ [PP fi — PR’fis— P’ Pfs, + RE’f 51. 
Hence, if the frequency of the type DD’ is unchanged 

(1/2V) pp’ (IP —KR) (I’P’—K’R’) +4pp [PP fy, — PR fis —P’ Rfs, + RR fg3| = PP fy. (XIXa) 


‘* Now the corresponding equations for the types DR’. RD’, R’D’ may be obtained simply by substituting 
K for I. R for P, and vice versa, as required; and each such change merely reverses the sign of the left-hand 
side, substituting q or q’ for p or p’ as a factor. 

Combining the four equations 


(H/2V) (LP — KR) (I’P’— K’R’) = 3[PP'f,, — PRf,3 — RP'fg, + RRfg5] (XX) 
so that the set of four equations 
(u/V) (IP — KR) (LP! — BR’) = pp'fu = — Pe fis = — Pfr = 11'F (XXT) 


gives the whole of the conditions of equilibrium. 
‘13. Substituting now in (XVII), which we may rewrite, 


Hh a) + dP "(e" —IT) fi me Ry’ aC k’) fis], 


K = k+ [Pw —j) fa —R'(9’ —k’) fos), 
we have 


IP—KR = iP—kR+3(u/V) (IP—KR) (I’P’—K’R’) [pi -j’) + q'(j’ —k’)] = iP —kR+ A(IP— KR), 
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where A(1—A) = (/V) EWP’ WR’) [p'(i’ -9') +0" — 8) 
= (2/V) BB, since p= OP —*R! 
2 
or A(1—A) = w(72/V). (XXII) 


Using (XVI) we convert (XVII) into 

T=ti+2{P''—-j))fu- BG’ -F') fis}, 

KS k+ UP’ -7') far — B'(9' — kh’) fo} 
Here the summation is taken over all loci other than the particular one under consideration. 
Multiplying by P and R, and subtracting we obtain 

IP—KR =iP-kR+AIP-KR), 

where A = X(u/[V) (I'P’—K'R’) {p'(t' -—9') + '(y' — Fh. 
In this form the result is not useful since I’ and K’, which refer the loci over which the summation 
is taken, occur on the right-hand side. We therefore apply the same formula as above to each of 
these loci to obtain UP'- RR a0 2K RR a A(l'P' om K'R’), 
because the summation can be taken over all loci, the contribution of any particular one being 
negligible. Then (1 — A) (I’P’— K'R’) = i'P’ —k’R’, and substituting again we get 


Fo =a {p'(' 7) +. 9'(7' — FB), 


so that A(1—A) = X(u/V) (o'P’ —k’R’) {p'(' —7') +. (9 — (XXITa) 
and each term in the sum now refers to a single locus. We can therefore drop the dashes. To the 
degree of approximation required we can put P = p*, R = q? and 
P(i—Jj) + Uj —k) = pi—gk + (p—) (1/2pq) (p* + gh) 
= (1/2p9) (p* — 9°k), 


2n zp) 
sb that finally Aaya fo = a 
— Lyn HO? KY 
poe =>. (XXITb) 


** Tt would seem that there is an ambiguity in the value of A, so that the same amount of assortative mating 
would suffice to maintain two different degrees of association: we have, however, not yet ascertained the value 
of V. Since this also depends upon A, the form of the quadratic is changed, and it will be seen that the 
ambiguity disappears. 

‘“‘ Supposing A determinate, we may determine the association coefficients f for 


Aare bh ((P—kR)(v’P’—-k’R’) , 
Ppt = (i_ap 7 PP’; es 
Ai nse ceed os ((P—kR)(v'P’—-k’R’)_ , 
Pp q “fis =, (1—A)? V PY 
p fb iP—kR é ‘ +” , “pr ‘PR 
= 4 ee / / —k P —k’R 
Hence ii a ay AV. Llp’ -7’) +9'(9 I@ ) 


Spiecto PART 
PARTE Tre aNd A: 
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, Ge id WAP ORR, 
anda so =4% Ted p 
A 4P—-KkR 
Similarly Seas ee Ee (XXIV) 
; 1-A qd 
oe Lee 
Oy mG aces hie rege 
and 7) eh aeon (@ ) 
Dil EM Say Ree 
e have —_—___._— = /—A, 
W Pe her 


and substituting this in (X XI) and multiplying by pp’ we get (XXIII) from which (XXIV) 
follows by simple substitution using (X XII a) and (X XITb). 


‘* So that the sense in which the mean value of the heterozygote is changed by assortative mating depends 
only on whether p or q is greater. In spite of perfect dominance, the mean value of the heterozygote will be 
different from that of the dominant phase. 

“The value of the variance deduced from the expression 


V = &( Pil + 2Q7J+ RkK) 
reduces to a similar form. For evidently 


A 
i ee ay | X(iP —kR) [p(i—j) +9(7 —k)]. 


Hence V=o7+ i = pS, (XXV) 


Therefore the equation for A finally takes the form 
pr? = VA(1—A) = A(1—A) 0? + A?7?, 
and may be otherwise written Ae? — Ao*®+ ut? = 0. (XXVI)”’ 


Here ¢? = o?—7? as usual. When A = 0 the left-hand side is wr? > 0. When A = y it becomes 
/(4— 1) (o? —7?) which is negative and when A = 1 it is still negative, whilst when A is large it is 
again positive. Thus the quadratic must have two roots, one in the interval (0, ~) and the other 
greater than unity. A cannot be greater than unity because the right-hand side of (X XIIb) is 
positive. 


‘* Now, since the left-hand side is negative when A = 1, there can be only one root less than unity. Since, 
viet ee (w—A*)7? = (A—A4) 0? (XXVIa) 


it is evident that this root is less than “, and approaches that value in the limiting case when there is no 
dominance. 


“A third form of this equation is of importance, for 
A 7 T+[A/(1—A)]7? 


wp Ae o*+[A/(1—A)]7® (XXVI6) 


which is the ratio of the variance without and with the deviations due to dominance. 

“14, Multiple Allelomorphism. The possibility that each factor contains more than two allelomorphs 
makes it necessary to extend our analysis to cover the inheritance of features influenced by such polymorphic 
factors. In doing this we abandon the strictly Mendelian mode of inheritance, and treat of Galton’s ‘particulate 
inheritance’ in almost its full generality. Since, however, well-authenticated cases of multiple allelomorphism 
have been brought to light by the Mendelian method of research, this generalised conception of inheritance 
may well be treated as an extension of the classical Mendelism, which we have so far investigated. 
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“Tf a factor have a large number, n, of allelomorphs, there will be n homozygous phases, each of which is 
associated with a certain deviation of the measurement under consideration from its mean value. These 
deviations will be written 7,,i,,...,2,, and the deviations of the heterozygous phases, of which there are 
3n(n — 1), will be written j,,, 713, jo3, and so on. Let the n kinds of gametes exist with frequencies proportional 
to p, ¥, 7, 8, and so on, then when the mating is random the homozygous phases must occur with frequencies 
proportional to p?, g?,r2,..., and the heterozygous phases to 2pq, 2pr, 2qr,.... 

** Hence, our measurements being from the mean, 


Py + Got 7g t+... + 2qJiet+ 2rjigt-.- = 0. (XII*) 
‘ As before, we define «2 by the equation 
Py + is tris +... + Wogit, + Wri, +... = a? be 
and choosing I, m,n, ...,s0 that 
p*( 21 — 11)? + q?(2m —%g)2 +... + 2pg(1 +m —J49)2 + 2pr(l+-n—Jy3)? +... 
is a minimum, we define /? by 
4Pp? + 43g? +... + 2pq(l+m)? + 2pr(l+-n)?... = B2, 
the condition being fulfilled if L= put Qiethigt-.- 
M = Diet QotMogt--+s 


and so on. 
‘Now B? = S(4l2p?) + S(2pq{l+m}?), 
= S(2p(1+p) 7?) + S(4pqim), 
and since pl+tqm+rn+...= 0, 
f? = S(2pl*), 


which may now be written as a quadratic in 7 and 7, represented by the typical terms 


2prit + Ap gis jra+ 2PUP+9) Fie + 4P9%V 2hi9- 


We assume there are 7 alleles A,,..., A, with frequencies p,q,7,... respectively. The n homo- 
zygotes are ApAswntAsAs, 
with values Ca ratn ly « 
and there are 4n(n —1) heterozygotes A, A,, A, Ag, ..., whose values are jy9,j13, «++ 
Put S, = p?(21—7,)? +... + 2pq(I +m —jy9)? +... 
where /, m,n, ... are to be chosen by least squares to give the linear additive contribution to the 


variance. (Fisher uses S without a suffix for summation.) 
The minimization equations are typified by 


1 os j : ; 
0= 2 ay = p?(21—1,) + pg(l+m —Jjyq) + pr(l+n—Jjy3) +. 
= pl(p+1)—piy-Qie—Tis— +. Fgmtrnt...} 
and since p + 0, l— pt, —Qi2—-Thig—--- + (plt+qm+rn+...) = 0. 


Multiplying this equation by p, the corresponding equation by q,7,..., and adding we get 
(pl+qm+t...)+(pt+qt...)(plt+tqm+...) =0 
because we have defined 7,, 72, ... and j,., ... so that the population mean 
Piz t+Pigt... + 2gjet--. 
is zero. Hence pl+qm+...= 0, 
and L = pty t+Qiatthigt--+ 
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The linear component of variance is then 
B? = 4{l?p? + m?q?+...$+ 2{pq(1+m)?+...}. 
But pl+qm+...= 0, and therefore 

pet+qm+...+2pqlm+...= 0. 
Taking twice this from £? we get 

f? = (2pl? + 2qm?+...)(pt+qt...) 

= 2pl?+2qm?+..., 
and inserting the values of J, m, ... we get 
BP = 2p(pty+ Qigt ---)?+29(Pjia+ Qte+Mgo+...)? +... 
= 2(p3i3 + 33+...) + 4(p?qty Jro t+ PQ%o Jig t---) + 2(pq772. + pryigt ... + p*Qjiet ---) 
+4(p9Tj19 Jigt---) 

of which the typical term is that given by Fisher. 


‘“‘ Now we can construct an association table for parent and child as in Article 6, though it is now more 
complicated, since the j’s cannot be eliminated by equation (XII*), and its true representation lies in four 
dimensions; the quadratic in? andj derived from it is, however, exactly one half of that obtained above, so that 
the contribution of a single factor to the parental product moment is $/?. Hence the parental correlation is 


1 73 
2 o?’ 


where 7 and o retain their previous meanings.” 
The association table between parent and offspring could be written down as a 


4n(n+1) x $n(n4+1) 


table but we need only to write out the typical terms. Part of these can be obtained from the 
previous parent-offspring table. 

For the parental types we can take A, A, and A, A,. The possible offspring types are then 
typified by A, A,, A, A,, A, A,and A, A,. The resulting table is shown as Table P. The covariance, 


TABLE P 
Parental type 

Offspring oo!" 

type A, A, A, A, 

A, A, p* pa 

A, A, pa pap +4) 

A, Ag, pr pqr 

A, As 0 pq 


or, as Fisher calls it, the quadratic expression, is then obtained by summing all terms typified by 
the above, thus giving 


P+ Pst... + Wis Jit... + p(Pt+QJigt --- + 2dr 12 jigt+---= Zh" 


counting all the terms in their proper multiplicity. 
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“* Moreover, from the fraternal table we may obtain a quadratic expression having for its typical terms 
Ep*(1 + p)ii + $p*qrey te + p'g(1 +p) ty Jig t PParts Jrss 
2PU1 +p +9+2p9)jio + Pgr(1 + 2p) Fre is + 2PIS) 19 a» 
which, when simplified by removing one quarter of the square of the expression in (XII*), becomes 
tp*(1 + 2p) t+ pqty Jr + 2PUL+ P+) Jig + POI 2 Jia» 
or, simply, 4(a?+ f?).” 


The fraternal table is rather more complicated to construct. We start from Table Q which gives 
the possible offspring from all possible types of mating which are 7 in number. 


TABLE Q 

Mating Frequency Offspring 
A,A,xA,A, pt A, A, 
A,A,xA,A, 4p*q $4, A,+4A,A, 
A,A,x A, A, 2p*q? A, A, 
A,A,xA, A; 4p*qr 3A, A,+ 44,4, 
A,A,xA,A, 4p*q* 4A,A,+4A,4,+44,4, 
A, A,xA,A; 8p*qr 4A, A,+4A4,A,+44,4,+44,4, 
A,A,xA;A, 8pqgrs 4A, A,+4A4,4,+44,4,+44,4, 


From this table we can pick out the possible pairs of sibs and their relative frequencies, as given 
in Table R, one sib corresponding to the columns and one to the rows. 


TABLE R 
Aas A, Ay 
: dy Jie 

A, A, uy zp*(1 +p)? p*q(1 +p) 
A, A, jie pg +p) 2pyl+p+ q+ 2pq) 
A,A; jis p*r(1 +p) 2pgr(1 + 2p) 
A, A, Jos p gr spqr(1 + 2¢) 
A;A, ts apr pyr 
A,A, Isa prs pqrs 


To illustrate how these frequencies are obtained consider the case where both sibs are A, A,. 
This can happen in the first, second, fifth and sixth type of mating and the total frequency is 
pt=tpr(qtrt+...)tip(etrt...)+4p(artqst...+7s+...) 
= pt+ap(1—p)+ep"(1—p)? = 2p*(1 +p)? 
(This is more easily obtained by the Li and Sacks method mentioned before.) Adding together all 
the resulting terms we get 
EPL + p)P t+ 29°(1 +g)? a+... + 2p?Q?ty ty + 2prrdy tg +... + PQ +p) ry Jrz + PL +p) ty ist 
+ p'qriy jog t+ PQS Jost.» + 3PUL + p+9+2pa)jiat 2pr(L+p+r+ 2pr) jist... 
+ pgr(1+ 2p) jie jig + P9S(1 + 29) Fro Juat «++ + 2797S) 12 Jaa t ++ 
thus agreeing with Fisher’s sum of typical terms except for his fourth term which should read 
Pqrr, jos and not p*grr, jx3- 
The square of the expression in (XII*) is 
{pi + Qrtgt 0. + 29Gf12 + 2713+ -..}2 = 0, 
and subtracting } of this from the above we get }(«? + f?) as stated. 
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‘* Here, again, the introduction of multiple allelomorphism does not affect the simplicity of our results; the 
correlation between the dominance deviations of siblings is still exactly }, and the fraternal correlation is 
diminished by dominance to exactly one half the extent suffered by the parental correlation. The dominance 
ratio plays the same part as it did before, although its interpretation is now more complex. The fraternal 
correlation may be written, as in Article 6, 


“15. Homogamy and Multiple Allelomorphism. The proportions of these different phases which are in 
equilibrium when mating is assortative must now be determined. As in Article 10, let J,,J,,... be the mean 
deviations of the homozygous phases, and J,5,J5,... those of the heterozygous phases. Let the frequency of 
the first homozygous phase be written as p?(1+/,,), and the others in the same way. Then, since p is the 
frequency of the first kind of gamete, 

Phutadhettfist --.= 9, 


and Phizt+ foot fog +++. = 9, 
and so on. 
“Let PL, t+Qyt134+-..= LD, 


PJy9+Qlot+rJogt+...= M, 
and so on, then L, M,... represent the mean deviations of individuals giving rise to gametes of the different 


kinds; hence, by Article 9, 2nq(1+fjs) = 2pqeHl”-EM, 
that is, fis = HV LM. (XIV*) 


The aim of paragraph 15 is to extend the treatment of assortative mating in paragraphs 9-13 
to the case where each locus may have more than two alleles, all loci remaining, as before, 
unlinked. Since we are concerned with second-degree statistics (variances and covariances) it is 
sufficient to consider the loci in pairs. 

In the stable population with assortative mating J, J,, ... and Jj», J;3, ... are taken as the mean 
values of the deviations from the population mean of the respective homozygotes A, A,, A, Ag,... 
and the heterozygotes A, A,, A, A3,..., with frequencies p?(1+/,,), q2(1+/fo9), ... and pg(1+fi2), 
pr(1+fj3), etc. Then the equations such as 


Phutdet-..= 9 
are necessary in order that the gene frequencies amongst all mating pairs should be exactly 
Dp, q, ete. 

Notice in particular that f,,, f,., ... are not analogous to the f,,, ... used in the previous discus- 
sion of assortative mating where there are only two alleles at each locus. The f’s here refer to a 
single locus, and when referring to another locus we shall write fj1,f jo, .... 

The average deviation of the class of individuals which give rise to the gamate A, will be 

(1/2p) {2p7F, + 2pgJy.+...} = L, (XIV*a) 
to the first approximation, there being further terms involving f’s which we can ignore. By the 
type of argument used before we then have 


ILM 
2pq(1 + fia) = 2pgexp |“, 


and hig = HLM] V. 


The frequencies of A, A, and B, B, are 2pq(1+/,,) and 2p’q'(1+/{2), and their joint frequency 
which we now want to find is written as 


4nqp'q' (1+ fi2,1) 
or as 4pqp'q' (1+ fis) (1+f%2) (1 +fi212) 
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Note that the absence or presence of a dash on f{, means that f,, refers to A, A,, and fi, to 
B, B., whilst on the other hand f,. . and fj... both refer to both factors together, the difference 
being in their definition. Since the f’s are all small we expand the products and neglect small 


terms, obtainin ' , 
s fies = fietSiethes- 


In the absence of any assortative mating the gametic frequency of A, B, would have been pp’, 
but when y + 0 the proportionate increase, using the definition of f{, 4, etc., must be 


PP Firat PTS iat PM fiast ++ @'fieat Sites t US ies t 
te rP'f isa rr ssa Fir by definition. 


Thus the frequency of A, B, in the gametes is pp’(1+ F,,). The mean value of individuals giving 
rise to this gamete is 1+ L’ by the argument leading to (XIV*a) and so on, so that (XIX*) 
follows. In this equation the F’s are functions of the f{, 1, etc., which are known when the 
frequencies p, p’,... are given, and we want to solve for the fi. 19, etc. 

Fisher guesses that the solutions must be 


Fizag = (“/V) (2+) (L' + M’) 
and similar formulae. Putting these in the equation for F,, we get 
(“| V){pp'(2L) (2L’) + pq’ (2L) (L'+ M')+...+qp'(L+ M) (2L’) 
+qq'(L+M)(L'+M’')+...+rp'(L4+ N) (2L’')+...} 
= (u/V){L+pL+qM +...}{L'+pL'+qM' +...} 
= (u|V) LU, 
since pL+qM+...=0, pl'+qM'+...=0. 


“ The association between the phases of two different factors requires for its representation the introduction 
of association coefficients for each possible pair of phases. Let the homozygous phases of one factor be 
numbered arbitrarily from 1 to m, and those of the other factor from 1 to n, then, as the phase (12) of the first 
factor occurs with frequency 2pq(1+/,.), and of the second factor, with frequency 2p’q’(1+/{,), we shall write 
the frequency with which these two phases coincide in one individual as 4pqp’q’(1+/f/z.12), OF as 


4pgp'¢'(1+fi2) 1 +Si2) (1 +Si2.12)> 
so that fia =Siesethiethe: 
** The proportional increase of frequency of the gametic combination (1.1) is 
PPL at PU t PMS ast + Whe t 1 fia sat Ofiaas tos 
and so on. 
“ By virtue of the equations connecting the f’s of a single factor, this expression, which we shall term F,, 
has the same value, whether written with dashed or undashed /’s. 


“* Individuals having the constitution (12.12) may be formed by the union either of gametes (1.1) and (2.2), 
or of gametes (1.2) and (2.1); hence the equations of equilibrium are of the form 


2fioa2 = Fut Feet (M/V)(L4+L’) (M+ M’) + Fot Fat (u/V) (L+M’)(M+L’), 


but 2fie.12 = 2fis.19— 2f2— fre 
= 2fi5 1 —(24/V)(LM+L'M’), 
therefore fiers = Fat Foot Met Fat (u/V)(L+M)(L’+M’). (XIX*) 


*‘ By analogy with Article 12, the solution 
fie.ze = (#/V)(L+M) (L’+M’) 


suggests itself, and on trial it leads to Ey = (4/V) 107, 
and is thereby verified.” 
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To obtain LZ we argue as follows. Write the mean deviation of the homozygote A, A, from the 
population mean as J, = 1, +7* where ¢, is the deviation due directly to the genotype A, A, if the 
other genes were segregating independently of the A locus. 7# is then the extra deviation produced 
by the other genes in virtue of the association due to homogamy. The homozygote A, A, has 


frequency o*X(1+fy) 
and the double homozygote A, A,B, B, has frequency 

p’p'*(1 + far) (1 +f ix) (1 + fix, )- 
From this it follows that the conditional probability that an individual is B, B,, if it is known 
that itis A, ‘Aj; 1s p'>(1 +f53) (1 +fu, iP 
which to a reasonable approximation can be written as 

p*(1+fia+Si1, 11): 

Similarly, the probability that an individual is B,B, when it is known that it is A, Aj, is 
arnt 2p'q' (1+ fie +, 12) 


The total additional contribution of the individuals at the B locus to the measurement of A, A, 
individuals is therefore 


p?(1+firtfirar) 41 +91 + foo + fir2a) 22 +--+ 2p'G' (1+ fie +firsa) Jie + +++ 
We have already defined the effects 7;, 7/,, in such a way that the mean effect is zero, i.e. so that 
p™(1l+fia) it... +2p'¢'(L+fia)jie+ --- = 0. 
Thus the additional increment is simply 
PFiratit + 2p'dfirse diet ++ 


We now consider the sum over all loci other than the A locus and we denote this summation 
by the symbol . We get 


L= i, +f + Llp fay tit. + 20'¢'firss Jigt +} 


and similarly Sia = Jrot+ UP frat t+ + + DO fiars Jia t+}, 
the factor 2 occurring to include two terms of equal value. 
Write 


Te l= pty +Qiottigt---- 
L=pl+qJot... 
= 14+ 2{p"iy(firaP +hieadt +--+) +--+ 2p Fiol fir. 12? +hieaed +++) ++} 
Using the values of the f’s which we have found above, and 
plL+qM+...=0, 
we get furs P +hraad+ = (¢/V) L(2L'), 
and similarly fiuaeP +herwdt-.= (u/V)L(L'+ mM’) 
(these results being misprinted in the text). Using these we get finally 


L =1+(uL|V)S{(2L') p'%i, +... +2(L' + M) p'q'jig +... 
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Since Vi=p'tit+qQ jit. 
this can be written L=1+(ulb]V)Xf2p'VL' + 2q’m' M’ + ...}. 


Since each individual locus is regarded as contributing a vanishingly small component of the 
total variance we can now suppose the summation to be taken over all factors at all loci and not 
merely all those other than the locus being considered. We can then put L = 1+ AL, where 


A = (n/V)Z{2p'l'L’ + 2q’m' M' +...}. 


Since A is independent of the locus being considered, we also have 


D'=l+Al_l', 
so that ’=(1-A)L’, 
and then A(1—A) = (u/V)=(1—A) (2p'V'L' +...) 
= (u/V) X(2p'l’? + 2q’m"2 +...) 
= (u/V)7°. (XXII*) 


In a similar way substituting for the f’s in the formulae for J, and J,,, and then putting 
L = \(1—A)-1, etc. we get 
2M 


oak Fue Val—Ay {4p"7Ul'ty +... + 2p’q'U(U' +m’) jira +--+} 
img DAT 
Nees 
7 2 , tag 1 wey A / / ~ 
and Siz = drat ra ap 2? *(L+m) li, +... +4p’q' (+m) (UU +m') jr. + ---} 


‘ A 
=Juty—q itm). 


“* Hence we may evaluate L, L’,..., for 


Fong Pak 


L= ply +QJygt+rhy t+... += X{ps' (fir 1+ Giga t ++) + 20'OIi0(Phi1.12 + Wiese t +++) Fees 


but PhusuatQhiaiut += (h/V)L(L’+M’), 
therefore L=1+(p/V) L&f{p’i’(L’ + L’) + 2p’q’7)o(L’ + M’) + ...}; 
=1+(u/V) LX(2p'VL’ + 2q’m’M’...). 
“Let L=1+AL, 
then L=1/(1-A), 
and A = (p/V) X(2p'V’L’ + 2q’m’M’ +...), 
therefore A(1—A) = (p/V) X(2p’l’? + 2q’m’? +...) 
=(4/V) 2B", 
therefore A(1—A) = (p/V)7? (XXII*) 


so that the association constant, A, appearing now in the constant ratio 1: L, plays exactly the same part in 
the generalised analysis as it did in the simpler case. 
“* Tt may now be easily shown that the mean deviations, J and J, may be calculated from the equations 


J, =7i,+2Al/(1—A), \ 
and. Tyo = jin t+ [A/(1—A)] (+m), 


and that the variance reduces, as before, to 


(XXIV*) 


o?4([A/(1—A)]7?. (XXV*) 
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“16. Coupling. In much modern Mendelian work coupling plays an important part, although the results of 
different investigators do not seem as yet to converge upon any one uniform scheme of coupling. The type 
found by Morgan in the American Fruit Fly (Drosophila) is, however, of peculiar simplicity, and may be found 
to be the general type of the phenomenon. 

‘* An individual heterozygous in two factors may owe its origin to the union of either of two pairs of gametes 
either (1.1) x (2.2) or (1.2) x (2.1); when coupling occurs, the gametes given off by such an individual, of all 
these four types, do not appear in equal numbers, preference being given to the two types from which the 
individual took its origin. Thus in a typical case these two types might each occur in 28 per cent of the gametes 
and the other two types in 22 per cent. Coupling of this type is reversible, and occurs with equal intensity 
whichever of the two pairs are supplied by the grandparents. We may have any intensity from zero, when each 
type of gamete contributes 25 per cent to complete coupling, when only the two original types of gamete are 
formed, and the segregation takes place as if only one factor were in action. 

The above analysis of polymorphic factors enables us to compare these two extreme cases; for there are 
9 phase combinations of a pair of dimorphic factors, or, if we separate the two kinds of double heterozygote, 
10, which, apart from inheritance, can be interpreted as the 4 homozygous and the 6 heterozygous phases of 
a tetramorphic factor. The 4 gametic types of this factor are the 4 gametic combinations (1.1), (1.2), (2.1), 
(2.2).” 


This mapping of a system with two factors at each of two loci on to a system with four factors 
at a single locus is particularly interesting and can be illustrated as follows. 

Suppose that at the first locus the two factors denoted by 1 and 2 in the text are A, and Ag, 
and similarly B,, B, at the second locus. The nine phase combinations are then 


AB) Bae epee eed nee, 
AA BIB?) a YNdeB BR. © A SAME. 
AMAIBER LAR AY BD Beas Wala A ARS 


When linkage (‘coupling’ is Fisher’s term) is considered the double heterozygote A, A,B, B, 
really corresponds to two different heterozygotes according as whether A, and B, are on the same 
chromosome or on different chromosomes. We shall denote these two types by (A, B,) (A, B,) and 
(A, B,) (Ay B). 

Now consider a single locus with four factors C,, C,, C, and C,. This results in four homozygotes 
and six heterozygotes. If we identify the four factors C,, C,, C, and C, with the pairs of factors 
A, B,, A, B,, A, B, and A, B, respectively we have the following mapping of the two loci situation 
on the single locus situation. 


A, A,B, B, OQ, (A, By) (A, B,) CC; 
A, A,B, B, CC, A, A,B, B, CC, 
A, A, B, B, CC, A, A,B, B, CO; C3 
A, A,B, B, CLC; A, A,B, B, O30, 
(A, B,) (Ag By) CLC, A, A, B, B, CC, 


Thus to deal with linkage Fisher considers the two possible extreme cases of no linkage and 
complete linkage when there is assortative mating but, as above, no epistatic effects. 

Case I. Here we have two unlinked loci with A,, A, at one, and B,, B, at the other. These have, 
as usual, gene frequencies p, g, p’, q’, respectively, and as before the coefficient of assortative 
mating is “. The mean deviations associated with A, A,, A, Ay, A, A, are again ?, j, k, and 7’, 7’, k’ 
for the other locus. 

Let L be the mean deviation produced in the population by a gamete carrying A,, and define 
M, L’, M’, similarly. Thus the mean deviations associated with gametes A, B,, A, B,, A, B,, and 
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A, B, are L+L’, L+M’ (not M+M’' as in Fisher), +L’, and M+M’. By the theory given 
above the frequencies of these four gametes are 


ppt (MV) LL}, — pq’{lt+(u/V) LM", 
ptt (u/V) ML’, = qq’{1+(u/V) MM". 


These are denoted by Fisher as p, q,r,s. 
The frequency of a double homozygote like A, A,B, B, is, by the previous argument, 


Pp? (1+fira) = P*p?21+futsiat+fa) 
approximately, where 
fu=LE IV, fi, = eL?/V, fia = 4¢LL'|/V. 


Thus to this order of approximation the frequency of A, A, B, B, is 
p>p'{1 + (u/V) (L2 +L +4LL')} = {pp'(1 + (u/V) LL’)? {1 + (u/V) (L+L')*}. 


Case II. If there is complete linkage the four pairs A, B,. A, B,, A, B,, A,B, each behave like 
a single gene. We suppose they each have the frequencies given above as p,q,r,s. We also 
suppose that the deviations produced by these ‘genes’ are the same as occurred in the previous 
case so that, for example, a zygote A, A, B, B, would have the deviation i +)’, the genes at any 
other loci being held fixed. 

We must first investigate whether the genotypic frequencies in the second case will be the same 
as in the first. If there is no assortative mating (4% = 0) this is obviously true since the frequency 
of A, A, B, B, in the unlinked system will be (pp’)? which is the square of the frequency, pp’, of 
the ‘gene’ A, B, in the linked system. 

We have also seen that assortative mating changes the frequency of gene combinations at any 
pair of loci only by a quantity of the first order of smallness. Thus to this degree of approximation 
the ‘genotypic’ frequencies should remain the same. 

The mean deviation in individuals carrying the gamete A, B, will then be the same in both 
systems. This is + L’ which Fisher writes as a capital L. The similar result holds for the other 
gametes. 

The variance, V, in the population in the two cases should also be the same. 

Then treating A, B,, etc., as single genes the frequency of a genotype such as (A, B,) (A, B,) 
will be, to the first order of approximation, 


p? {1+ (u/V)L} = {pp'(1+ (u/V) LL) P+ (W/V) (L4+L')7}, 


which agrees exactly with the result obtained in Case I above. Thus to this degree of approxima- 
tion, which is as far as Fisher goes in his theory, the two systems of completely unlinked and 
completely linked genes agree as regards the frequencies of occurrence, the magnitudes of the 
effects produced by genes and gene-combinations, and the effect of assortative mating. 

Fisher does not explicitly prove that the correlation between relatives will be the same in the 
two cases. To show this it is necessary to show that the values of 7? = Xf? are equal since the 
correlations will be later expressed in terms of r?, V, 1, and A (A being given by equation (X XII))- 

To prove this we return to the original definition of £2. To simplify the notation denote the mean 
deviations, j, k produced by A, A,, A, As, Ag Ag bY jizs Jig = Jor Jog (notice the change in notation 
from Fisher’s use of these symbols). We shall also write the gene frequencies p, q of A,, A, as 
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P1 Po. As before we proceed by fitting ‘representative values’ for which we shall abandon Fisher’s 
notation c+b, c, c—6, and write instead 


Representative value for A, A, = x,+4%,, 


where 7, s = 1, 2. These values are to be found by least squares. Neglecting the small changes in 
frequency due to assortative mating we have to minimize the sum 


S, a UDP, Ps(Irs — &, — a)”, 
which is equal to p(t—c—b)? + 2pq(j —c)2?+q2(k-c+b)? 


in Fisher’s notation. The conditions for a minimum are that 


12s. “ 
9 op, aa UPs(Irs— Fr %s) = 0. 


f? is then defined as the variance of the representative values so that 
B? = pi? + 2pqj? + g?k? — (p% + 2pgqj + g?k)? 
= Up, P(t, +5)? — {Up Pa %, + %s)} 
= 2(ip, a; —M?), 
where M = Xp,72,. 
The same argument applies if we have multiple allelomorphs A,(r=1,...,4), and a similar 


definition applies to the alleles at the second locus for which the representative values, x). + x;, are 


the solutions of Spi, =2— 2!) = 0. 
7 


Now consider the system with complete linkage so that the ‘alleles’ are (A, .B,). Since there is no 
epistacy, the mean deviation produced by A, A, B, B,, is j,,+Jy,; all other genes being fixed. If we 
neglect the small deviations in frequency produced by assortative mating we can find a ‘repre- 
sentative value’, x,,, for the ‘gene’ (A, B,) by minimizing the sum 

Lp, Ps Pt Pil dns Foi — Ly — Tey)”. 
The conditions for this are, by differentiating, 


2 2 Ps Puldrs + Jiu — Vt — Ties) = 0. 


The solution of these equations is simply 2,, = “+2; as can be verified by substituting these 
values and using the previous equations for ,, 27. 
The new value of £? for the system with complete linkage is 
BP? = QZ, Dj (y+ 4)? — ATP, Di (2, + 94) }? 
= {Epi Lp, x; + 2p, x, Lp x + Lp, Dp a" — (Up; Up,.x, + Up, Vpy %)"} 
= 2{p, a+ 2M M’ + Sp; x;? — (M+ M’)} 
= YS p,22— M2 + Spi 2/2 "9 
= +p 
Thus in the sum 7? = Xf? the two terms f? and £” which occur in the system with unlinked 
genes are replaced by a single term £”? in the system with complete linkage, but by the above 
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equation the value of 7? is unchanged. Thus, as will be shown later, the correlations between 
relatives are unaffected. 

Fisher does not consider what happens with values of the recombination fraction lying between 
0 and $, and he seems to imply that because there is no important difference between the extreme 
cases of no linkage and complete linkage it is highly probable that the same results will be obtained 
for such intermediate values. 

However, there are serious gaps in the argument to be filled before this is demonstrated. It 
might be thought that a population in which the recombination value was inside the interval 
(0, 4) could be regarded as a mixture of two populations in one of which linkage is absent and in 
the other in which it is complete. Simple calculations show that this is not correct. 

When linkage is complete the gene combinations, (A, B,), can be regarded as single genes and 
there are no restrictions on the frequencies which can be assigned to them. In particular we can 
suppose, as above, that (A, B,) has frequency p,p.(1+F,,). But if linkage is not complete the 
frequencies of gene combinations are determined by the properties of the system and can no 
longer be chosen at will. It is therefore of interest to show that the frequency of A, B, can still be 
taken as p,p, in a stable population. 

When the mating is not assortative, and F,, = 0, this is well known. It can be proved for 
assortative mating in the following way. Suppose first that the two loci are unlinked. Then 
a double heterozygote, A, A, B, By, can arise in two ways. Hither A,B, comes from one parent 
and A, B, from the other (call this ‘coupling’), or A, B, comes from one, and A, B, from the other 
(‘repulsion’). The frequencies of A, B, and A, B, are 


pp'(1+ Fy) = ppt + (#LL'/V)}, 
and qq'{1+ (uM M'|V)}. 
The average deviation of individuals giving rise to A, B, is L+ L’, and that of individuals giving 


rise A,B, is M + M’. If there was no assortative mating the probability of such a pair of gametes 
would be the product of their frequencies. However, with assortative mating this product has 


to be multiplied by exp [(u/V)(L+L’)(M+M’)], 
which can be approximated by 
1+(4/V)(L+L')(M+DM’). 
Thus with assortative mating the total probability of such a pair of gametes is 
pp'gq' {1+ (H/V)(LL'+MM'+(L+L') (M+ My} 
= pp'qq {1+ (u/V) (LM + L'M' +(L+M)(L'+ M'))}. 
By symmetry we get the same probability of a union between A, B, and A, B, so that coupling 
and repulsion are equally frequent. 

Suppose that we have a stable population in which there is no linkage, and instantaneously 
linkage is introduced with recombination fraction c where 0 < c < 4. Intheimmediately following 
generation the only effect which could occur would be a change in the proportion of offspring of 
double heterozygotes. From a double heterozygote in coupling we get gametes in the proportion 


3(1—c) A, B, + 3cA, By + 3cA, By + 3(1—¢) Ay By, 
and from one in repulsion we get: 
4cA, By +3(1—c) A, By +3(1—c) A, By +3cA, By. 
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Since coupling and repulsion heterozygotes have the same phenotype, they have the same 
probability of mating with any particular genotype, and since coupling and repulsion are equally 
frequent, the gametes produced by all heterozygotes will have frequencies which are the averages 


of the above frequencies, i.e. 14, B, +34, B,+}4,B, +44, Bp, 
which is just what happens if ¢ = 4, i.e. when there is no linkage. 

Since the introduction of linkage has not changed the frequencies in the next generation the 
population remains stable in all further generations. 

The same argument can be used to show that the parent—offspring correlationis independent of 
the recombination fraction. It does not show at once that the same is true for sib-sib and more 
distant relationships but this is plausible. Fisher does not discuss these more complicated cases 
in his paper and we do not pursue the matter further. 


“The mean deviations associated with these 4 gametic types are L+ L’, M+ M’,..., and we therefore write 
L=L4+L’, M=L4+M’, N=M+L’, O=M+M’. 

‘“‘ Further, if these gametic types occur with frequency, 

P= pptl+(M/V)LL4 qg=pq'{l+(u/V)LM’} 

r=qp'{l+(“4/V)ML} s=qq7’ilt+(u/V) MM, 

it is clear that the frequencies with which the homozygous phases occur, such as 
Pp? +fiu) = pp ?{1 + (u/V) (2 +L? +4LL')}, 
p*{l + (u/V)(L+L')} = p*(1+(u/V)L*), 


are exactly those produced, if there really were a single tetramorphic factor. 
‘“Tn the same way the phases heterozygous in one factor also agree, for 


2p*p'g' (1 +S ty 12) = 2p*p’a’{1 + (w/V) L?+L’'M’ + 2L(L’+M’))} 
= 2pq{l+(u/V)(L+L’)(L+M’)} = 2pq{l+(“/V) LM}. 
‘“* Finally, taking half the double heterozygotes, 
2pqp'd' (1 +S i212) = 2pqp’g'{1 + (m/V) [LM + L’M’ + (L+M)(L’+M’)}} 
2ps{l1+(u/V)(L+L’)(M+M’)} = 2ps{1 + (u/V) LO}, 
or, equally, 2qr{1+(u/V)(L+M’)(M+L’)} = 2qr{1+(u/V) MN}. 


‘**From this is appears that a pair of factors is analytically replaceable by a single factor if the phase 
frequencies be chosen rightly: but the only difference in the inheritance in these two systems is that in the one 
case there is no coupling, and in the other coupling is complete. It would appear, therefore, that coupling is 
without influence upon the statistical properties of the population.” 


Fisher now considers the correlations between individuals in a population in which there is 
assortative mating and environmental effects. To do this he uses regression theory. Suppose all 
measurements are taken from the mean of the population. Let x be the measurement in one 
individual and X in another. The correlation between xz and X is then 

p = cov (a, X) {var (x) var (X)}-4 
= cov (z, X)/V. 
We suppose so many factors are acting that the joint distribution of x and X is bivariate normal 


so that the regression lines are straight. Then the expected value of X for any given z is 


E(X|givenx) = fe =px and p=2x1H(X|givenz). 
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Fisher tacitly supposes that the effects of environment can be represented by an addition to the 
measurement which is independent of the genetic value so that there is no ‘interaction’ between 
genotype and environment. This ‘environmental deviation’ is supposed to be normally distri- 
buted with zero mean and constant variance, and is not correlated among relatives. Then, 
measuring from the mean, we can write 


x = observed value 
= y (genetic value) + environmental effect 


= z (representative value) + dominance deviation + environmental effect. 


These three terms, the first two of which are sums over the various loci, are mutually uncorre- 
lated. Thus with a large number of loci, the joint distribution of (2, y, z) is trivariate normal, with 
z (representative value), y—z (dominance deviation), and x—y (environmental effect) all 
statistically independent. It therefore follows that 


cov (w,y) = var (y) = V, 
cov (x, 2) = cov (y,z) = var (z), 
var (x) = var (y) + var (7), 


where 7 is the environmental effect. 
Thus for the regression coefficients we find 


bie = Gay ee Ds =1. 


Then an increase 6z in the representative value will on the average increase both the genetic 
component y, and the observed measurement z, by 6z. This is also evident from the above 
decomposition. 


Thus we have gota _ cov(x,y) _ var (y) 
yin = © (Say) = (x) ~ var (x)’ 


_ var (z) T* 


and, using (X XVID), bz .y = Cy (say) = var (y) pry Re Pe 


Now let x, y, z be the values for a father, and X, Y, Z, the corresponding values for his son. 
The regressions of the values X, Y, Z, on x, y, z will arise in two ways. In the first place, the 
partial regression of Z on z, keeping the mother fixed, will be $ (from the table in section 5), 

The dominance deviations (y—z), (Y — Z), and the environmental effects (~—y), (X — Y), are 
uncorrelated with each other and with z, Z. Thus it is easy to find the regressions of any of X, Y, Z, 
on any of 2, y, 2. 

However, there is a second indirect component of regression arising from the fact that the son’s 
Zis correlated with the mother’s representative value, which is in turn correlated with the father’s 
because of assortative mating. Fisher now finds this extra component of regression under three 
different hypotheses about the nature of assortative mating, namely that the underlying 
association is between (1) the observed characters x; (2) the genetic components y; (3) the repre- 
sentative values z. 

Notice that he now uses y for the observed correlation between the x values, whereas in the 
previous discussion it was the correlation between the y’s. 
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“17. The effects both of dominance and of environment may be taken into account in calculating the 
coefficient of correlation: if we call the actual height of the individual, y what his height would have been 
under some standard environment, and z what his height would have been if in addition, without altering the 
extent to which different factors are associated, each phase is given its representative value of Article 5. Then, 
since we are using the term environment formally for arbitrary external causes independent of heredity, the 
mean x of a group so chosen that y = ¢ for each member will be simply ¢, but the mean y of a group so chosen 
that « = ¢for each member will be c, ¢, where ¢, is a constant equal to the ratio of the variance with environment 
absolutely uniform to that when difference of environment also makes its contribution. Similarly for the 
group z = ¢, the mean value of y is ¢, but for the group y = ¢ the mean z is c,¢, where 


T2 


Cy 


‘*Now, we may find the parental and grandparental correlations from the fact that the mean z of any 
sibship is the mean z of its parents: but we shall obtain very different results in these as in other cases, 
according to the interpretation which we put upon the observed correlation between parents. For, in the 
first place, this correlation may be simply the result of conscious selection. If the correlation for height stood 
alone this would be the most natural interpretation. But it is found that there is an independent association 
of the length of the forearm:* if it is due to selection it must be quite unconscious, and, as Professor Pearson 
points out, the facts may be explained if to some extent fertility is dependent upon genetic similarity. Thus 
there are two possible interpretations of marital correlations. One regards the association of the apparent 
characteristics as primary: there must, then, be a less intense association of the genotype y, and still less of z. 
The other regards the association as primarily in y or z, and as appearing somewhat masked by environmental 
effects in the observed correlation. In the first place, let us suppose the observed correlation in x to be primary.” 


In the discussion below, assuming this first interpretation of marital correlation, if one parent 
has the value x = ¢, the children will have the value 


and not C1 Ce 5) 


as misprinted in the paper. The remainder of the formulae follow. 


“‘ Then if w is the correlation for x, c, 4 will be that for y, and this must be written for y in the applications 
of the preceding paragraphs. Hence er ae 
we ae Had | 


and p, ¢, and A are the marital correlations for x, y, and z. 
‘* Since the mean z of a sibship is equal to the mean z of its parents, we may calculate the parental and 


grandparental correlations thus: for group chosen so that x = t: mean y, ¥ = ¢,t; mean z2,z = ¢, ¢,t; & of mate 


is ut; Z of mate is ¢, c, ut. Therefore z of children is 
ies 


C1 Co 


‘* Hence, since there is no association except of z between parents and child, the parental correlation 


fficient is 
coeffici ee, 
CC, ag 


Now, since we know the mean z of the children to be 


1+ 
Ci Cg St, 


, 1+ 
the mean z of their mates is C10a—5 F At, 


* Pearson and Lee, ‘ On the Laws of Inheritance in Man.’ Biometrika, 2, 374. 


P. A. P. MORAN AND C. A. B. SMITH 51 
and the grandparental correlation coefficient will be 
1+w1+A 
Cy Cy “ie oR . 
Similarly, that for the (n+ 1)th parent will be 
1+y/1+A)\" 

reer: 

giving the Law of Ancestral Heredity as a necessary consequence of the factorial mode of inheritance. 


“18. If we suppose, on the other hand, that the association is essentially in y, the coefficient of correlation 
between y of husband and y of wife must be /c, in order to yield an apparent correlation yz. Also 


and Als f Coe 


vis the observed correlation of x’s. If the structural correlation occurs in the y’s, it must there- 
fore have value yc; so that sy 
fe = Cy ("Cy*) 


and the argument proceeds as before. 


“The parental correlation found as before is now 
C1 Cg + Ac, 
2 > 
and the higher ancestors are given by the general form 


C1 Cgt+Ac, (1+A\" 
wa Co LPG hh? 


a 


although A is now differently related to c,, c, and y. 

‘*TIn the third case, where the essential connection is between z of husband and z of wife—and this is a 
possible case if the association is wholly due to selective fertility or to the selection of other features affected 
by the same factors—the equation between the correlations for y and z is changed, for now the marital 
correlation for y is equal to Ac, when we retain the definition 


T? 

Cs = ———— 

2 02 — Aer” 
“* Hence also fe = Ac, cs, 


and the correlation coefficients in the ancestral line take the general form 


14A\"4 
€ Ca |— > 7 


“19. On the first of these theories a knowledge of the marital and the parental correlations should be 
sufficient to determine c, c,, and thence to deduce the constant ratio of the ancestral coefficients. 
Thus for three human measurements: 


Stature Span Forearm 
fe 0-2804 0-1989 0-1977 
p 0:5066 0-4541 0-4180 
C1 Cs 0-7913 0-7575 0-6980 
A 0-2219 0-1507 0-1377 
4(1+A) 0-6109 0-5753 0-5689 


These figures are deduced from those given by Pearson and Lee (loc. cit.), neglecting sex distinctions, which 
are there found to be insignificant, and taking the weighted means.” 


4 M&S 
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In the table above, wis the observed correlation between mates as taken from Pearson and Lee, 
and p is the observed parental-offspring correlation. We then find c, c,, A, and }(1+ A) from the 
formulae 


Ne 
P= Oa, A = CCM. 


‘“‘ These values for 4(1+ A) agree very satisfactorily with the two ratios of the ancestral correlations which 
have been obtained, 0-6167 for eye colour in man, and 0:6602 for coat colour in horses. It is evident that if we 
also knew the ratio of the ancestral correlations for these features, we could make a direct determination of 
A and ascertain to what extent it is the cause and to what extent an effect of the observed marital correlation. 

‘20. The correlations for sibs, double cousins, and more distant relations of the same type, in which all the 
ancestors of a certain degree are common, may be found by considering the variance of the group of collaterals 
descended from such ancestors. The variance of a sibship, for example, depends, apart from environment, only 
upon the number of factors in which the parents are heterozygous, and since the proportion of heterozygotes 
is only diminished by a quantity of the second order, the mean variance of the sibships must be taken for our 
purposes to have the value appropriate to random mating, 


37° + ge* = ZV [2c,(1 — A) + 3(1 —¢3)] 


plus the quantity (V/c,) — V due to environment. But the variance of the population is V/c,; and the ratio of 
the two variances must be 1—/, where f is the fraternal correlation. Hence 


f= te,(1+¢,+ 2c, A).” 


Still assuming the first model of correlation basically between the x’s, we have to find the 
‘variance of a sibship’. We imagine the number of individuals in a sibship indefinitely increased, 
and then the 2’s of the resulting individuals will have a distribution with mean m,, say, and 
variance v,. Both of these will depend on the genetic character of the parents. The observed value, 
x, of a random sib from a random sibship may be decomposed into two parts as 


x=M,+(L—M,), 
where x and m, are both random variables. Since in any one sibship we have 
E(x—m,) = 9, 
by definition, we also must have E{m,(x —m,)} = 0 


within each sibship, and therefore in the whole population. Thus m, and (x—m,) are uncorre- 
lated. From this it follows that 
var (x) = var (m,)+var (~—™,), 

= U,, Say. 
Here var (x — m,) means the mean value of (2 — m,)? taken over all sibs in all sibships. It is therefore 
the mean value of v, taken over all sibships and can be written as v,. Then 

var (m,) = 0,—,. 

If x, X, are the measurements of a random pair of sibs from a random sibship, 


cov (x, X) = cov (m,+{x—m,},m,+{X —m,}) 


var (m,) 
= Vz — V5. 
Thus the sib—sib correlation is 


oh cov (x, X) ea cee 1 Vg 
Me {var (a) var(X)}  /{v,v, Uy 
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This can be written RA sits 

The variance, v,, within any sibship depends only on segregation within that sibship and 
therefore only on those genes for which the parents are heterozygous, since if the parents are 
homozygous the effect is to make a constant addition to all sibs alike. But the frequencies of 
heterozygotes at any locus are affected by assortative mating only by a small quantity so that 
the variance within sibships will be changed by a proportionally small quantity. Thus 3, can be 
taken, nearly enough, to have its value for random mating, although var (m,) will have to be 
changed. 

If there are no environmental effects, and no assortative mating, the correlation between the 


sibs is 724 162 
207 
Thus the covariance between sibs is 472+ te?, 


which will be unaffected by any environmental effects which are such that they are uncorrelated 
in the sibs. We also have 


Vie = g2 2 
var(y) =o gp Pe 
A 
= T+ et + say tie, 
a 72 


Solving these equations for 7? and ¢? we get 


T? = Ve,1—A), e? = V(1—c,) 
From these we have 
cov (%, X) = 477+ te? 
= £V{2c,(1—A)+3(1—c,)}. 
We also have ¢, = var (y)/var (2) = Vuz", 


and substituting in the formula for f we get Fisher’s result. 

For double cousins we argue as follows. At any one locus each member of a double cousinship 
may be regarded as having one gene chosen at random from the four carried by his father’s 
parents, and one chosen at random from the four carried by his mother’s parents. The variances 
of the cousins within the cousinship will depend only on the dissimilarities within each of these 
two sets of four genes, and therefore by the same argument as before, will be almost independent 
of assortative mating. 

Let x and X be the observed values of the two double cousins, and f the correlation between 
them. The variance of the population, and therefore of x or X is Vcj1, and the variance due to 
environmental effects is Vey !—V,. Then the variance of «—X must be 


E(~a—X)? = 2Ve;p34(1—-f) 
on the one hand, and E(a—X)? = 2V(cp1—1)+ 2(0? — 47? — Ge?) 
on the other, because the correlation between the genetic components for double cousins is 


known to be 1 
40? (7? + 46?) eg 


Thus the second term above is the genetic component of variance. 
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Putting o? = 77+, and substituting for 7? and e?, we get 
Voy ™(1—f) = V(eg*—1) + Vegeg(1—A) + t¢(1 —e2)}, 
so that 1—f = 1—c,+ 3c, c,(1 — A) +4¢,(1—c,), 
and f = Ofte +s5¢e + 24c,}. 


‘Tn the same way, the variance for a group of double cousins is unaffected by selective mating, and we find 
the correlation coefficient for double cousins to be 


~j5¢1(1+ 3c,+12c, A), 


showing how the effect of selective mating increases for the more distant kin. 
“* On the first hypothesis, then, we must write, 


ay = 6 avi 
Ree: p=cyry 2° 
and f = te,f{1+¢,(1+2A)}. 


“21. Weshall use this formula for the fraternal correlation to estimate the relative importance of dominance 
and environment in the data derived from the figures given by Pearson and Lee. 
‘* Assuming as the observed correlations 


Stature Span Cubit 
hb 0-2804 0-1989 0-1977 
Pp 0-5066 0-4541 0-4180 
aj 0-5433 0-5351 0-4619 
we obtain as before 
C1 Cy 0-7913 0:7575 0-6980 
A 0-2219 0-1507 0-1377 
and calculating c, from the formula c, = 4f—c,¢,(14+2A), 
we obtain the three values 1-031 1-155 0-957 


with a standard error of 0-072, and a mean of 1-048.” 


Presumably by ‘standard error’ Fisher means ‘standard deviation of the observed values’. 
However, this is not clear; the standard deviation based on two degrees of freedom would be 
0-100, not 0-072 and the standard errors in the next table also do not agree. It is not clear what 
precisely is in Fisher’s mind here. He does all his calculations to three or four decimal places. 
But he does not give any indication of the accuracy of the correlations on which his calculations 
are based, other than the ‘standard errors’ quoted from time to time. These do not seem to be 
standard errors in the sense of the term most used nowadays, namely, the standard deviation 
of the estimate to be expected in repeated sampling. The text suggests that the three values of c, 
for respectively stature, span and cubit were looked upon as if they were three estimates of 
some ‘ideal’ or ‘true’ value of c,, differing from this only by random fluctuations. 


“This relatively large standard error, due principally to our comparative ignorance of the fraternal corre- 
lations (errors in #4 have scarcely any effect, and those in p are relatively unimportant), prevents us from 
making on a basis of these results a close estimate of the contributions to the total variance of the factors 
under consideration. 
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‘‘ Remembering that c, is intrinsically less than unity, the second value is inexplicably high, whilst the first 
and third are consistent with any value sufficiently near to unity. The mean of these results is materially 
greater than unity, and therefore gives no support to the supposition that there is any cause of variance in 
these growth features other than genetic differences. If this is so, we should put c, = 1, and compare the 
observed values of f with those calculated from the formula 


4f = 1+¢,(1+24A). 
“ With their standard errers we obtain 


Standard 
Stature Span Cubit error 
Observed 0-5433 0-5351 0:4619 0-016 
Calculated 0:5356 0:4964 0-4726 0:008 
Difference —0:0077 —0:-0387 +0-0107 0-018 


‘The exceptional difference in the fraternal correlations for span might, perhaps, be due to the effects of 
epistacy, or it may be that the terms which we have neglected, which depend upon the finiteness of the number 
of factors, have some influence. It is more likely, as we shall see, that the assumption of direct sexual selection 
is not justified for this feature. Accepting the above results for stature, we may ascribe the following percentages 
of the total variance to their respective causes: 


% % 
Ancestry 54 
Variance of sibship: 
47? 31 
$62 15 
Other causes 
46 
100 
Again it may be divided: 
Genotypes (c?): 
Essential genotypes (7°) 62 
Dominance deviations (€?) 21 
83 
Association of factors by homogamy 17 
Other causes an 
100 


“These determinations are subject, as we have seen, to considerable errors of random sampling, but our 
figures are sufficient to show that, on this hypothesis, it is very unlikely that so much as 5 per cent of the total 
variance is due to causes not heritable, especially as every irregularity of inheritance would, in the above 
analysis, appear as such a cause. 

‘Tt is important to see that the large effect ascribed to dominance can really be produced by ordinary 
Mendelian factors. The dominance ratio ¢?/a0?, which may be determined from the correlations, has its numerator 
and denominator composed of elements, 6? and a, belonging to the individual factors. We may thereby 
ascertain certain limitations to which our factors must be subject if they are successfully to interpret the 
existing results. The values of the dominance ratio in these three cases are found to be: 


Standard 
Stature Span Cubit error 
Dominance ratio 0-253 0-274 0-336 0-045 


‘92. The correlations for uncles and cousins, still assuming that the association of factors is due to a direct 
selection of the feature x, may be obtained by the methods of Article 14, using the two series already obtained: 


that for ancestors l+ py (1+A\" 

Cy Ca- 2 2 ? 
and that for collaterals, like sibs and double cousins, which have all their ancestors of a certain degree in 
as fof +c,(1 +24), 


Pecill + 3c,(1+44)], 
and so on. 
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‘‘ Thus if a group be chosen so that x = ¢, 
y of group is ¢, ¢, 


2 of group is ¢, Caf, 


14+A 
z of sibs is ¢, Cy t, 


also y of sibs is $c,[1+¢,(1+2A)]é, 
y of sibs mates is $c,[1+¢,(1+2A)]¢, pe, 
z of sibs mates is $c,[1+c,(1+2A)] A?. 


Hence z of nephews is 4c,[2c,(1+A)+{1+¢,(1+2A)}A]té, 
2 
giving the correlation €, C5 (=) +4c, A(1—c,). 


‘* Again for cousins, if a group be chosen so that x = t, we have 


14+A\? 
7 of uncles is| 6,05 ( 5 +4,4(1—0) |& 


" : i+-A\? 
Z of uncles is ¢ C, rarer | 
= : 14+A\? 
and Z of uncles mates is | ¢, C, 3 +4c,A(1—c,) | Ad, 
= ae 14)" oe = 
hence z of cousins is } ¢, Cy, 3 ++k¢,A*(1—ce,) | é, 
14+A\3 

giving the correlation CC, ( ; +-j5¢, A2(1—c¢,). 


“The formulae show that these two correlations should differ little from those for grandparent and great- 
grandparent, using the values already found, and putting c, = 1 we have 


Stature Span Cubit 
Grandparent 0:3095 0-2612 0-2378 
Great-grandparent 0-1891 0-1503 0-1353 
Uncle 0-3011 0-2553 0-2311 
Cousin 0-1809 0-1445 0-1288 


“*23. Onthe third supposition, that the marital correlation is due primarily to an association in the essential 
genotype z, we obtain results in some respects more intelligible and in accordance with our existing knowledge. 


‘“* From the fundamental equations 
? = %CgA, p= FC, CQ+p), 


we may deduce CyCg = 2p—p, A=pm/(2p—-p), 
whence the following table is calculated: 
Standard 
Stature Span Cubit error 

L 0-2804 0-1989 0-1977 0-0304 
Pp 0-5066 0-4541 0-4180 0-0115 
ij 0-5433 0-5351 0-4619 0-0160 
C1 Cy 0:7328 0-7093 0-6383 0-038 
A 0-3826 0-2804 0-3097 0-028 
$(1+A) 0:6913 0-6402 0-6549 0-014 


and making use of the fraternal correlations to separate c, and c,, by the equations 
f= re,[1+e,(1+24)], 
or c= 4f—2p—p, 
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we obtain 
Cr 0:8796 10333 0:8139 0:078 
Ce 0°8331 0:6864 0:7842 0:077 
62/0? 0-2450 0-3883 0-2850 0-105 


“The standard error for the dominance ratio is now very high, since the latter is proportional to the 
difference f—p. If we assume a known value for c,, and calculate the dominance ratio from p and yp only, 
the standard error falls nearly to its value in Article 18. 

“The three values for the ratio of the ancestral correlations 0-691, 0-640, 0-655 are now higher than that 
obtained from observations of eye colour, and are more similar to the value 0-660 obtained for the coat colour 
of horses. Without knowing the marital correlations in these cases, it is not possible to press the comparison 
further. It would seem unlikely that the conscious choice of a mate is less influenced by eye colour than by 
growth features, even by stature. But it is not at all unlikely that eye colour is but slightly correlated with 
other features, while the growth features we know to be highly correlated, so that a relatively slight selection in 
a number of the latter might produce a closer correlation in each of them than a relatively intense selection of 
eye colour. 

“The value of c, for span is still greater than unity, 1-033, but no longer unreasonably so, since the standard 
error is about 0-078. If we were considering span alone the evidence would be strongly in favour of our third 
hypothesis. A remarkable confirmation of this is that Pearson and Lee (loc. cit. p. 375), considering organic 
and marital correlations alone, show that the observed correlations could be accounted for by the following 
direct selection coefficients: 

Stature Span Cubit 


0:2374 0-0053 0-1043 


Naturally these cannot be taken as final, since there are a large number of other features, which may be 
connected with these and at the same time may be subject to sexual selection. The correlations of cross 
assortative mating are in fact smaller than they would be if direct selection to this extent were actually taking 
place. The influence of other features prevents us from determining what proportion of the observed association 
is due to direct selection, but if inheritance in these growth features is capable of representation on a Mendelian 
scheme—and our results have gone far to show that this is likely—it would be possible to distinguish the two 
parts by comparing the parental and fraternal correlations with those for grandparents and other kindred. 

** On our present supposition that the association is primarily in z and for the case of span this seems likely, 
the correlations for uncle and cousin will be the same as those for grandparent and great-grandparent, being 


given by the formulae 14A\2 14A\3 
Cy ea and a 9 , 
leading to the numbers 
Stature Span Cubit 
Grandparent 0-3502 0-2907 0-2737 
Great-grandparent 0-2421 0-1861 0-1793 


Fisher now considers the hypothesis that the observed correlation 4. between the phenotypes 
x of the parents arises as the summation of two effects. The first is a direct correlation s, which is 
the result of direct sexual selection. Fisher calls this the ‘coefficient of selection’. The second 
part, .—s, is a reflection of a correlation between their z-values, arising differently. Each of these 
parts can be treated as regression coefficient. He thus supposes that the effect on a child is the 
sum of the effects which arise by these two causes. 

Now the direct correlation or regression s between the phenotypes x of the parents produces 
a correlation c,c,s between their z-values, as shown in Section 22, and hence a regression c,c,s 
of the z-value of the father on that of the mother. The further correlation 4 — s between the parents’ 
x-values is a reflection of a correlation (w~—s)/c,¢, between their z-values, as shown in Section 17, 
and hence a regression (/“—8)/c,c,. If we suppose that these can be legitimately added together, 
the total regression of one z-value on the other is 

A = 016,8+ (u—8)/CyC, 

and this is equal to their correlation. 
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Similarly the direct correlation s produces a regression 4c,¢,(1+.) of child on parent, and the 
correlation (—s)/c,c, between the z-values of the parents produces a further regression (jy — s)/2. 
Adding these, we find for the total regression of child on parent, which is the same as the correla- 
tion between them 

P = 3C,6,(1+5)+3(M—8). 

The argument by which Fisher deduces the value 

f = 4C;(1 + ¢, + 2c, A) 
= 4C,+ }¢C,(1 +24) 
for the correlation between sibs still holds. From it we find 


Cy = 6,0,(1+2A)—4f. 


‘24, Neither these nor the similar table for the first hypothesis accord ill with the value obtained for 
uncle and nephew, 0-265, from measurements of eye colour. It may, however, be thought that neither of 
them give high enough value for cousins. Certainly they do not approach some of the values found by Miss 
Elderton in her memoir on the resemblance of first cousins (Hugenics Laboratory Memoirs, tv). Series are there 
found to give correlations over 0-5, and the mean correlation for the measured features is 0-336. From special 
considerations this is reduced to 0-270, but if the similarity of first cousins is due to inheritance, it must certainly 
be less than that between uncle and nephew. No theory of inheritance could make the correlation for cousins 
larger than or even so large as that for the nearer relationship. 

‘* Tt will be of interest finally to interpret our results on the assumption that the figures quoted (Article 20) 
represent actual coefficients of selection. Manifestly it would be better to obtain the value of A experimentally 
from the ratio of the ancestral correlations, using the collateral correlations to determine what are the marital 
correlations for y. For the present we must neglect the possibility of an independent selection in y: and 
although we know that the figures are not final, we shall write s, the coefficient of selection, equal to 0:2374, 
0-:0053, and 0-1043 in our three cases. 

“Further, let 


fs 
A=,C.8+ Py hs 
so that 2p = ¢,¢,(1+s)+p—s, 
whence we deduce 
Stature Span Cubit 
C4 Cp 0-7841 0-7108 0-6725 
A 0-2410 0-2761 0-2090 
4(1+A) 0-6205 0-6381 0-6045 


the values of A being now in much closer agreement for the three features. Further, from the fraternal 


correlation we have 
Ci 1:0112 1:0370 0:8940 
with a mean at 0-9821. 
** Again, for the dominance ratio 


0-2763 0:3880 0-2940 0:3194 (mean), 


leaving a trifle under 2 per cent for causes not heritable, but requiring high values about 0-32 for the dominance 
ratio. 

“25. The Interpretation of the Statistical Effects of Dominance. The results which we have obtained, although 
subject to large probable errors and to theoretical reservations which render an exact estimate of these errors 
impossible, suggest that the ratio e?/o?, the statistical measure of the extent of dominance, has values of about 
0-25 to 0-38. In his initial memoir on this subject Karl Pearson has shown that, under the restricted conditions 
there considered, this ratio should be exactly 4. Subsequently Udny Yule (Conference on Genetics) pointed 
out that the parental correlation could be raised from the low values reached in that memoir to values more 
in accordance with the available figures by the partial or total abandonment of the assumption of dominance. 
To this view Professor Pearson subsequently gave his approval: but it does not seem to have been observed 
that if lower values are required—and our analysis tends to show that they are not—the statistical effects are 
governed not only by the physical ratio d/a, but by the proportions in which the three Mendelian phases are 
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present. This effect is an important one, and very considerably modifies the conclusions which we should draw 
from any observed value of the dominance ratio. 

gS The fraction 6?/a?, of which the numerator and denominator are the contributions of a single factor to é? 
and o*, is equal, as we have seen (Article 5, equations V—-VIL) to 

2pqd* 
(p +q)?a? — 2(p*—q*) ad + (p? + q?) da?’ 

and depends wholly upon the two ratios d/a and p/q. We may therefore represent the variations of this function 
by drawing the curves for which it has a series of constant values upon a plane, each point on which is specified 
by a pair of particular values for these two ratios. The accompanying diagram (fig. 1) shows such a series of 
curves, using d/a and log (p/q) as co-ordinates. The logarithm is chosen as a variable, because equal intensity of 
selection will affect this quantity to an equal extent, whatever may be its value; it also possesses the great 
advantage of showing reciprocal values of p/q in symmetrical positions.” 


on 0-10 0-16 0: 


Values of d/a 


1 
3 2 


2 co ae? he a. 
Fig. 1. Values of logy) (p/q) (upper figures) and of p/q (lower figures). 


The dominance ratio given above is obtained by simple substitution of P = p?, Q = pq, R = q, 
p+q = 1, into (V) and (VII). 
In the paragraph below, the figure 3 is misprinted for 0-3. 


“Tt will be seen that 3 is not by any means the highest value possible: when d = a, and when p/q is very 
great, any value up to unity may appear; but high values are confined to this restricted region. When d/a is 
less than 0-3 the ratio is never greater than 0-05, and we cannot get values as high as 0-15 unless d/a be as great 
as 0-5. On the other hand, all values down to zero are consistent with complete dominance, provided that the 
values of p/g are sufficiently small. 

“We know practically nothing about the frequency distribution of these two ratios. The conditions under 
which Mendelian factors arise, disappear, or become modified are unknown. It has been suggested that they 
invariably arise as recessive mutations in a dominant population. In that case p /q would initially be very high, 
and could only be lowered if by further mutation, and later by selection, the recessive phase became more 
frequent. These factors would, however, have little individual weight if better balanced factors were present, 
until p/q had been lowered to about 10. In face of these theories it cannot be taken for granted that the 
distribution of these ratios is a simple one. It is natural, though possibly not permissible, to think of their 
distributions as independent. We may profitably consider further the case in which the distribution is sym- 
metrical, in which the factor of known a and d is equally likely to be more frequent in the dominant as in the 


recessive phase. 
“ For this case we combine the numerators and denominators of the two fractions 


2nqd* Hee | 2nqd? 
(p+)? a? — 2(p? — q?) ad + (p? +9") a? (p +9)? a? + 2(p? — q?) ad + (p? + 4°) d?’ 


60 COMMENTARY ON FISHER 


and obtain the joint contribution 2pqd , 
(p+q)?a*+ (p? +?) d? 


the curves for which are shown in fig. 2, representing the combined effect of two similar factors, having their 
phases in inverse proportions. It will be seen that complete dominance does not preclude the possibility of low 
value for the dominance ratio: the latter might fall below 0-02 if the greater part of the variance were contri- 
buted by factors having the ratio between p and q as high as 100 to 1. This ratio is exceedingly high; for such 
a factor only one individual in 10,000 would be a recessive. We may compare the frequency of deaf mutism 
with which about one child in 4000 of normal parents is said to be afflicted. It would be surprising if more 
equal proportions were not more common, and if this were so, they would have by far the greater weight. 

“The fact that the same intensity of selection affects the logarithm of p/q equally, whatever its value may 
be, suggests that this function may be distributed approximately according to the law of errors. This is a 
natural extension of the assumption of symmetry, and is subject to the same reservations. For instance, a 
factor in which the dominant phase is the commonest would seem less likely to suffer severe selection than one 
in which the recessive phase outnumbers the other. But if symmetry be granted, our choice of a variable 
justifies the consideration of a normal distribution. 

‘“* Writing € for log, p/q and o for the standard deviation of £, we have 


p = e2§/2cosh}E, gq = e-3§/2cosh}E and 2pq = 4sech? hE. 


3 0-25 0-16 0:10 0-05 


Values of d/a 


2 3ie4 
ig. 2. Values of log, (p/q) (upper figures) and of p/q (lower figures). 


- 


0 
1 
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‘* Hence we have to evaluate 


1 ms) bye 1 0 ‘ 
eae an | lip Z£sech? 4 .eS20'dg = on | ie 4sech? 4c€. e- tt dé, (XXVIII) 
and the dominance ratio derived from the whole group is 


Ed? 
a?+(1—£) da?" 


“ Fis a function of o only, which decreases steadily from its value $ when o = 0, approaching when @ is 
large to the function 2/(¢ ./27). The function (16+ 1607+ 47°) osculates it at the origin, and appears on 
trial to represent it effectively to three significant figures. This function has been used for calculating the form 
of the accompanying curves. Fig. 3 shows the course of the function H. Fig. 4 gives the curves comparable to 
those of figs. 1 and 2, showing the value of the dominance ratio for different values d/a and o. If the assump- 
tions upon which this diagram is based are justified, we are now advanced some way towards the interpretation 
of an observed dominance ratio. A ratio of 0-25 gives us a lower limit of about 0-8 for d/a, and no upper limit. 
If the possibility of superdominance (d > a) is excluded, then the ratio of the phases must be so distributed 
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that the standard ratio e% is not greater than about 3:1. A greater value of the standard ratio would make 
the effect of dominance too small; a smaller value could be counteracted by a slight reduction of d/a. We have 
therefore no reason to infer from our dominance ratios that dominance is incomplete. We may speak of it as 
having at least four-fifths of its full value, but we can set no upper limit to it. 


Values of E 


Values of d/a 


1 2 4 8 10 20 40 100 


Fig. 4. Values of log,, of standard ratio (upper figures) and of standard ratio (lower figures). 


“26. Throughout this work it has been necessary not to introduce any avoidable complications, and for this 
reason the possibilities of Epistacy have only been touched upon, and small quantities of the second order have 
been steadily ignored. In spite of this, it is believed that the statistical properties of any feature determined 
by a large number of Mendelian factors have been successfully elucidated. Due allowance has been made for 
the factors differing in the magnitude of their effects, and in their degree of dominance, for the possibility of 
Multiple Allelomorphism and of one important type of Coupling. The effect of the dominance in the individual 
factors has been seen to express itself in a single Dominance Ratio. Further the effect of marital correlation has 
been fully examined, and the relation between this association and the coefficient of marital correlation has 
been made clear. 
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‘* By means of the paternal correlation it is possible to ascertain the dominance ratio and so distinguish 
dominance from all non-genetic causes, such as environment, which might tend to lower the correlations: this 
is due to the similarity in siblings of the effects of dominance which causes the fraternal correlation to exceed 
the parental. The fact that this excess of the fraternal correlation is very generally observed is itself evidence 
in favour of the hypothesis of cumulative factors. On this hypothesis it is possible to calculate the numerical 
influence not only of dominance, but of the total genetic and non-genetic causes of variability. An examination 
of the best available figures for human measurements shows that there is little or no indication of non-genetic 
causes. The closest scrutiny is invited on this point, not only on account of the practical importance of the 
predominant influence of natural inheritance, but because the significance of the fraternal correlation in this 
connection has not previously been realised. 

‘*Some ambiguity still remains as to the causes of marital correlations; our numerical conclusions are 
considerably affected according as this is assumed to be of purely somatic or purely genetic origin. It is striking 
that the indications of the present analysis are in close agreement with the conclusions of Pearson and Lee as 
to the genetic origin of a part of the marital correlation, drawn from the effect of the correlation of one organ 
with another in causing the selection of one organ to involve the selection of another. This difficulty will, it is 
hoped, be resolved when accurate determinations are available of the ratio of the grandparental to the parental 
correlation. From this ratio the degree of genetic association may be immediately obtained, which will make 
our analysis of the Variance as precise as the probable errors will allow. 

‘* In general, the hypothesis of cumulative Mendelian factors seems to fit the facts very accurately. The only 
marked discrepancy from existing published work lies in the correlation for first cousins. Snow, owing 
apparently to an error, would maké this as high as the avuncular correlation; in our opinion it should differ 
by little from that of the great-grandparent. The values found by Miss Elderton are certainly extremely high, 
but until we have a record of complete cousinships measured accurately and without selection, it will not be 
possible to obtain satisfactory numerical evidence on this question. As with cousins, so we may hope that more 
extensive measurements will gradually lead to values for the other relationship correlations with smaller 
standard errors. Especially would more accurate determinations of the fraternal correlation make our 
conclusions more exact. 


‘* Finally, it is a pleasure to acknowledge my indebtedness to Major Leonard Darwin, at whose suggestion 
this inquiry was first undertaken, and to whose kindness and advice it owes its completion.” 
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